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siaropsis 

of the 

Ph. D, Dissertation 
on 

ESTIMATION AND PREDICTION IN LONGITUDINAL STUDIES 

by 

Markand Pushpak Oza 
Depajrtment of Mathematics 
Indian Institute of Technology/ Kanpur (India) 

September/ 1986 

Very often one comes across studies vhere observations 
are made repeatedly on the same experimental unit over a 
period of time* Such observations may be available for a 
nxamber of experimental units# These studies are generally 
known as longitudinal studies* There are certain advantages 
of such studies over cross-sectional studies# for details 
see- Ware (The American Statistician, 1985, 39,95-101)* The 
ert^hasis of longitudinal studies is on the explanation of 
changes within units* The purpose of the present study is 
to fit an ^propriate model that can summarise the information 
contained in the data* Estimation of model parameters is, 
of course, an inportant aspect in the analysis* But the main 
interest of the investigators is usually in prediction rather 
than in estimation, because prediction involves the response 
which investigators actually measure* With this in mind, 
considerable exrphasls is laid in this thesis on prediction 
problems* 



The thesis consists of five chapters# A chapterwise 
summary is as follows : 

Chapter 1 is introductory in nature^ It ejqjlains the 
purpose of this study# Relevant literat\3re on growth curves 
and repeated measurement studies is briefly reviewed# Expectation 
and maximization (EM) algorithm for obtaining the maximum 
likelihood (ML) estimates/ an iterative technique that will 
often be used during the course of the thesis/ is also explained# 

In Chapter 2/ we consider the choice of optimum degree 
of polynomial and problem of prediction under uncertainty 
about the degree of the polynomial in random coefficient 
regression models# Two strategies for selection of covariates 
for covariance adjusted estimators are also discussed# Some 
of the results of this chapter have been published in Oza and 
Shukla (Corramin# Statist# •-Theor# Meth#/ 1986/ 15 ( 5)/ 16 53"'1664)# 

A generalised multivariate growth curve model/ which 
allows different design matrix and different dimension of 
regression parameters for each characteristic/ is consided 
in Chapter 3# The random regression coefficients for each 
characteristic may also depend upon different set of aiaxiliary 
variables# This generalises the model of Reinsel ( J ♦ Am# Stat# 
Assoc#/ 1984/ 79/ 406-414) in many ways# The ML procedure is 
adopted for estimation of the parameters. The problems of 
prediction and derivation of its asymptotic mean square error 
(MSE) are considered* A Monte Carlo simulation study is carried 



out to conpare the performance of various predictors including 
an empirical Bayes type predictor* A possible application in 
preharvest forecasting is considered in great details* 

Anderson ( J. Am, Stat. Assoc., 1973, 73, 371-378) has 
considered a multivariate autoregressive process for repeated 
series. From the practical point of view there is a need to 
generalise this model to a two-stage model. This is done in 
Chapter 4. In the first stage the usual regression model is 
considered and in the second stage the regression parameters 
of the first stage are assximed to follow a first order auto- 
regressive process. An EM-algorithm for the confutation of ML 
estimates of the model parameters is derived and prediction 
under this model is discussed* Some sinple estimators, as 
an alternative to ML estimators, are also considered* A Monte 
Carlo simulation study is performed to cortpare these methods 
of estimation. Some possible ^plications of such two— stage 
models are discussed. 

A gensral mixed model with autocorrelated errors, which 
is similar to one of the models considered by Swamy ( Statistical 
Inference in Random Coefficient Regression Models * 1971, 
Springer-Verlag, Berlin), is considered in Chapter 5* The ML 
estimators are obtained using EM-algorithm* Prediction using 
these estimated values is considered® An expression for its 
approximate MSE is derived* A Monte Carlo simulation study is 
performed to coirpare the b^aviour of estimator* and predictors 
based on the ML method and some other methods* 



chapter I 


INTRODUCTION 


1 , 1 INTRODUCTION 

It is very difficult to give a universally acceptable 
definition of the term longitudinal studies# However, there 
ars certain features which are common to all longitudinal 
studies* In these studies measurements on the same character— 
istic are obtained from the same experimental units at two or 
more occasions* Data on several units may be available and 
units may be assumed to be selected randomly from a population 
of units# 

The purpose of the longitudinal studies is usually 
to explain the changes taking place in an experimental unit 
between occasions and attriloute these changes to certain 
background variables* There are certain advantages of 
longitudinal studies over cross-sectional studies# The 
response may be affected by uncontrolled variables and in 
such situations the longitudinal studies generally produce more 
efficient estimates of within unit changes than the cross- 
sectional studies# "When the population is of limited size, 
as in the case of a hospital, there is no choice but to depend 
on longitudinal studies# In a recent paper Ware (1985) has 
given a good description of application of linear models in 
longitudinal studies# However, there ars some difficulties in 



longitudinal studies, such as the observations may be correlated, 
the CO variates may vary with time, observations may be missing 
at some points etc»,» This makes the usual statistical analysis, 
developed for independent observations, unsuitable for the 
present purposes ■» If the main purpose is to study the variability 
over xanits then the longitudinal studies may not be very efficient^ 
The longitudinal studies are also time consuming* 

Soma work on hybrid of longitudinal and cross-sectional 
studies, called mixed studies, has also been reported in 
literature* One of the first such studies is due to Rao and 
Rao (1966) and they termed it as linked cross-sectional study* 

Such studies are less time consuming and provide quick estimates 
of growth rate than that by a purely longitudinal study* For 
more details on method of analysis one may refer Rao and Rao 
(1966), Woolson ^ al» (1978) and Woolson and Leeper (1980)* 

For a review of the literature on the topic see Dielman (1983)* 

In longitudinal studies, the observations on an 
experimental unit are ordered in time and hence form a time series* 
But for a time series analysis to be valid one needs to have 
many observations - certainly many times more than what we have 
in a t 3 ^ical longitudinal study* Therefore, there is a need to 
develop separate techniques for analysis of the longitudinal data* 
A number of specialised methods are used in the analysis of the 
longitudinal data uiider the title of growth cuarve models* 



The analysis simplifies considerably when observations are 
made on each experimental unit at a set of fixed time points# 

For the purpose of analysis, the usual method of 
fitting a model is followed# The reason to fit a model is 
that it can summarise the infoirmation contained in the data# 
Estimation of model parameters is, of course, an iirportant 
aspect in the analysis# But since early-saventies the eirphasis 
has shifted frcm estimation to prediction# This change has 
taken place due to recognition of uhe fact that the main interest 
of the investigators is usually in prediction rather than 
estimation, because prediction involves the response which 
investigators actually measure# With this in mind, considerable 
errphasis is laid in this thesis on prediction problems# 

The structure of the thesis is as follows : In section 
1#2 of this chapter we review the literature on growth and 
repeated measurement models# Motivation for the present work 
is discussed in section 1#3# Ejqjectation and maximization (EM) 
algorithm for obtaining the maximum likelihood (ML) estimates 
is explained in section 1#4# In Chapter II, we consider the 
choice of optimum degree of polynomial for prediction under 
uncertainty about the degree of the polynomial in random 
coefficient regression (RCR) rtKDdels# Two strategies for selection 
of covariates for covariance adjusted estimators are also 
proposed# Estimation and prediction in a multivariate RCR model 
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when designs are different for different characteristics is 
considered in Chapter The dimension of regression parameters 

and set of covariates for each characteristic may also be 
different# A two~stags autoregressive model for repeated 
measurements is proposed and studied in Chapter IV* A general 
mixed modal with auto correlated erirors is considered in Chapter V* 

1,2 REVIEW OF THE LITERATURE 

Model and estimation : In growth curve model studied 

frequently in literature we have N independent individuals, 
each providing an observable p coupon ent random vector y having 
multivariate normal distribution with dispersion matrix Q and 
unknown mean X^* Here X is a known pxq design matrix of rank 
q < p and 3 is an xinknown q conponent vector of regression 
parameters# This is called a one~sarrple growth cxirve model# 
Wishart (1938) is credited for using such growth curve models 
for the first time# His analysis begins by replacing large 
number of obseivations on each individual by a few coefficients 
of fitted orthogonal polynomial# The means of growth rate and 
its change are coxtpared by univariate analysis of variance# 

Box (1950) suggests the use of analysis of variance technique 
to the first difference of successive observations when the 
assxonption of uniform covariance structure is valid* He hints 
at the multivariate techniques when the assunption of uniform 
covariance matrix is not valid# Rao (19 58) introduces the 
concept of time metameter i*e# of transforming the variables so 
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that the data are linear with respect to the transformed 
variables^ The corrparison of growth parameters is then made 
by the use of analysis of variance or Wilhs' A -criterion# Rao 
(19 59 ) has obtained the least square estimator of £ when dispersioi 
matrix Q is unknown but estimate of it is available# He also 
considers the problems of inference regarding ^ and setting 
confidence bands for a linear function of 3 # 

A k— sartple version of this model is considered by 
Rao (1961 )• Glesser and Olkin (1966) consider the inference 
problems for such models in detail# They obtain simultaneous 
maximum likelihood estimators of regression parameters of k— 
sarrples i.e# and the common dispersion 

matrix q # They also obtain the distribution of these estimators# 
They derive the likelihood ratio test for testing equality of 

and distributions of likelihood ratio test 

statistics* 

Potthoff and Roy (1964) introduce a polynomial grawth 
curve model which is a generalisation of the multivariate 
analysis of variance (MANOVA) model# The model is of the form 
E(y) =s X6W where Y is pxN matrix of observable random variables 
whose columns are independently distributed with unknown disperslt 
matrix q , W is a kxN matrix across individuals and 0 is a qxk i 

matrix of unknown parameters# Their method of analysis is a ‘ 

( 

multivariate analogue of weighted least square method# The 
most severe limitation of this ^proach is the absence of a ! 
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well defined matrix G occuring in the transformation to the 
usual I4A.N0VA model# For any choice of symmetric matrix G such 
that G ^ and (X'G exist, the estimator of 6 is unbiased 

and is the most efficient when G = Q# Khatri (1966) tachles 
the problem of arbitrary G in Potthoff— Roy analysis and obtains 
maximum likelihood estimator of 6 when Q is unknown* He also 
obtains likelihood ratio test of : CGD = O, a linear 
combination of some estimable fianction of 6, and its confidence 
bands* 

Some work on incorrplete growth data is also reported 
in the literature* Trawinski and Bargman (1964) develop 
procedures for estimation and tests of hypotheses in one sample 
growth curve models* Klienba-um (1973) considers the case of 
missing observations in the generalised growth curve model of 
Potthoff and Roy (1964)* Analysis of inconplete multiresponse 
designs is considered by Srivastava (1966,1968) and Roy et al * 
(I97l)* All of them consider fixed effect models with general 
dispersion matrix and view it as a situation in the set up of 
MAnovA model* A review of MANOVA of repeated measurements is 
given by Timm (1980)* 

Some amount of variability in observable random 
vectors y^ may be due to random variation in the individuals 
themselves* The random coefficient regression (RCR) rrradel 
proposed by Rao (196 5) takes into account the variability due 
to individuals* At the first stage, the model has the following 
form/ 
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y. = XJ3 . + e. 

fo-i- <oX foi 


with 


ECe. 10.) =s0; var(e,.l0j) or^I* 

rv>X (S>X fv>l- <N>1 


This is kno^m as the conditional model considering 0.'s as fixed# 
If we assume l^.'s to be random variables with 

CO X 


E(0 .) = 0 -J var(0j) = P j cov(0j#0 .) =0, i ^ 3 } 

^ X <s> co X fvj X ^v> J| 


cov(0-/e. ) = 0 for all i and j, 

foX roj ^ 


then we get an RCR model.# This model imposes a structure on 
dispersion matrix that Q t= XPX' + a^J.t Rao (196 5) obtains 
least square estimators of 0j and 0# If an unbiased estimator 

fO ^ CO 

of Q which has a Wishart distribution independent of y^'s 

coX 

exists/ then he derives the likelihood ratio test for testing 
appropriateness of the model and the structure of o ^ Fisk 
(1967) also considers the problon of estimation in RCR models.* 
Properties of various estimators of the popxilation regression 
parameters are also studied* h. thorough reviev; of the RCR 
model has been given by Spjotvoll (1977)/ he mentions many 
possible applications of the model and certain problems for 
future research* 

Rao (1966) considers covariance adjustment in Potthoff- 
Roy model of growth curves* Least square estimation and 
inference on 0 for growth curve models when q has various 

fs> 

structures have been developed by Rao (196 7)# In the seme papeiTif 



8 


he discusses the possibility of using a subset of covariates 
for covariance adjusted estimator to obtain more efficient 
estimators of 3 and setting confidence bands for some linear 
function of He obtains the necessary and sufficient 
condition for the unweighted least square estimator to be the 
best linear unbiased for © in a generalised growth curve model# 
The condition is 

Q = X r + Z A Z' + ; (l*2-l) 

where Z is any p x (p-q) matrix of rank p-q such that X'Z = 0 
and r and A are arbitrary unknown positive definite matrices# 
Grizzle and Allen (1969) note that the Potthof f-Roy^ s estimator 
with G = S*j, where 

NS*=: Y [l - W'(WV/)*’^W] Y' f 

Rao^s (1966) estimator with all covariates, and Khatri' s (1966) 
estimator are identical* Grizzle and Allen (1969) also consider 
the issue of selection of covuristes^^ 

Swamy (197l) considers a more general version of RCR 
model by allowing X to be different for each unit but the 
dimension of X is the same for all the xinits# He derives a 
two-step estimation procedure# First, the individual regression 
parameters are estimated by the least square method# Then the 
estimates of the population regression parameters and variances 
are obtained. This procedure may sometimes give a negative 
estimate of variance- The estimators are consistent as the 
length of the series becomes large# 
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A model of considerable interest for our work is the 
one where he assumes that the errors follow a stationaxy 
autoregressive process of order 1 (Swamy, 1971/ Section 4*4) f 
He considers that each unit has its own autocorrelation 
coefficient and obtains consistent estimators as the length of 
the series becomes large. 

In a recent paper. Carter and Yang (1986) consider a 
RCR model where the length of each series of observations may 
not be the same. They obtain consistent estimator of P which 
is modification of estimator given by Swamy (1971), for this case 
when the smallest of the lengths of series may be small but 
number of units is large. 

Khatri (1973) derives the likelihood ratio test for 
testing three types of dispersion structures viz., (i) independence 
of two sets, (ii) the sphericity of the dispersion matrix and 
(iii) intraclass model of the dispersion matrix# He also obtains 
the null distribution of the test statistics# 

Bayesian approach to growth curves is considered by 
Geisser (1970), Lee and Geisser (1972, 1975) and Fearn (1975,1977)# 
Geisser (1970) provides Bayesian Justification for the covariance 
adjusted estimator of Rao (1967)* Lee and Geisser (1972) 
recommend the use of posterior mode rather than posterior mean 
as it considerably slrrplifies the conputation procedure# The 
likelihood ratio test for 
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: Q = X r X' + Z A Z'^ + 0^1 (r and A positive definite and 

unknown ) 
vs 

: Q arbitrary 

has also been developed* Feam (1975) views the RCR model as 
a hierarchical linear model and performs a Bayesian analysis* 
A case when popiilation growth curve is of a lower dimension than 
the individual growth curves is dealt with in a later paper by 
Fearn (1977)* 

Rao (1975) has considered eirpirical Bayes (EB) 

estimators for RCR model for estimation of coefficients P.'s 

2 

when F, P and a are unknown* It may be noted that the 
estimators are not EB in the strict sense of the word because 
the estimators of s are obtained by replacing the parameters 
by their sarrple estimates* In the rigid sense (e*g* Maritz, 
1970), the parameters have to be estimated from the past 
occurrences of the same situation* Rao (19 75) makes a very 
thoughtfxal suggestion of introducing a shrinkage parameter, 
to be determined suitably, when the unknown parameters are 
r^laced by their estimates* The value of such a parameter 
which minimizes the average mean square error (MSE), where 
average is taken over all in the sarrple, has been obtained* 

Strenio et _^* (19 83) consider a Potthoff-Roy type of model 
for mean and structxared dispersion matrix of RCR model. In 
their model, individual regression parameters are assumed to 
be functions of some fixed known covariates which may be 
different for different individuals* 
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In a series of papers^ Reinsel (19 82/ 1984a/ 19 85) 
has considered generalisation of growth curve for one characteri- 
stic to a multi characteristic case# Reinsel (1982) considers 
generalisation of Potthoff— Roy model to multicharacteristic case 
and develops procedures for testing the hypothesis about various 
linear combinations of elements of fixed effect parameters# A 
similar generalisation of RCR model has been considered by 
Reinsel Cl9S4a)# The choice of optimum value of constant 
niultiplier when estimated values are used in place of unknown 
parameters and approximate prediction MSE have also been considered* 
A case when design matrix may not be the same for all tne 
individuals is considered by Reinsel (1985)# An expression for 
approximate prediction MSE has also been derived* 

Reinsel (1932) shows that when the dispersion matrix 
is of the form 

Q = (X0 I) r (X' (g) I) + (z ^i) A(Z^ X) + ii® v^)/ 

Cl«2#2) 

where denotes a Kronecker product/ which is a multivariate 
generalisation of (1*2# l)<the unweighted least square estimators 
are the best linear unbiased* A likelihood ratio test for 
testing the structure (l#2#2) of dispersion matrix is derived 
by Chinchilli and Carter (1984). 

Anderson (1978) considers a multivariate autoregressive 
process for repeated series and obtains the ML estimates of the 
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parameters* Procedures to test the nature of autoregressive 
parameter matrix are also developed# 

Laiid and Ware (19 82) consider a family of tx^jo-stage 
models for longitudinal studies* Use of EM algorithm for ML 
and restricted ML estimation p3X)cedure is explained# This is 
xllustratsd by examples# 

A priori reduction in the number of covariates for 
models x-^ith different structures of the dispersion matrix is 
considered by Varbyla (19 86)# The reduction in nu]:Tiber of 
covariates is shovjn to depend on the structxxre in inverse of 
dispersion matrix and the form of the design matrix# In 
particular^ the cases of autoregressive processes and RCR 
models are considered# 

For more details on models used in longitudinal studies# 
one may refer to Geisser (1980)# Wool son and Leeper (1980)# 

Khatri (1985) and Ware (1985)* 

1*2*2 Prediction : Before we move on to review the work on 

prediction in longitudinal studies# it would be proper to make 
the distinction between estimation and prediction* Estimation 
is a procedure of determining the value of fixed parameters 
whereas prediction is concerned regarding the value about the 
random event itself* Prediction in time series literatxrre is 
commonly knox/m as forecasting# A good review on some early 
procedures of forecasting in xmivariate time series is given by 
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Nowbold and Grangai* (1974)# Howevar, in the present thesis 
WG are not interested in time sorios techniques# 

Some work on prediction in linear models is also 
report s_d in literature# Consider the model 


y + e * 

fo to 

whare y is a px' vector of observable quantities^ 5 is a qxl 
vector of regression parainsters and e is a pxl vector of errors* 
Assume that 

varCy) = ^ 

The purpose is to predict y,. as a linear fiinction of y^ where 


^ + e 


rc 


and X is a known qxl vector, var(v ) = a ‘V-„ and 


* 


cov(y,y ■) = a 


^ Cjl ^ 

Let I* Y he a linear predictor of i#e,# 


ro 


y =£' y=jl' X ^ + S,' G# 

W <s> fo ca 

It can be seen that y.^ is unbiased for x'^0, the e3q)ected 


fS> ^ fV3 


value of y.,, if 

*sr 


= X • 

ro fo^ 


The variance of y is given by 


(1*2#3) 


var(y„) = E(y - x' iP)' 
= SL' A # 

^ X JL. «S> 
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This measures the mean sc^uare error (MSE) of y from its mean 
h* ^ E* However/ a more relevant measure of deviation of 
from its predictor y^ is given by 


eCk. ~ yJ^‘ 2 [ ^ X p) + a' X ^ - y )]^ 

“ ro |s> 

'"’^[1' .V-S 2 - 2*' Xl9J- 


Hence the best 3. inear unbiased predictor is obtained when S, 
is chosen in such a way that it minimizes (l»2*4) subject to 
( 1 * 2 * 3 )* It is easily seen that the best choice of 2- is given 

ro 

by 


11 ;;i: 


5, = v.t V.,,, 


The predictor is thus obtained as 


= 

tsj ro 


+ 


S 2 


V^JCy - 


X P) 


= E(y ly)/ 

CO 


Gl*2.5) 


The predictor y„ is called the best linear unbiased predictor 
of y • If P and V are not' toiown then it is natural to put 

ro 

the estimated values of these parameters in (1*2*5)* There 
will be an increase in MSE due to the use of estimated values 
in place of the parameters* This type of predictor is considered 
by Goldberger (1962)* 


To gat sc.-fEi general idea of the work done on prediction 
in linear models one may refer to Bibby and Toutenburg (1977)* 

A problem of considerable interest in longitudinal 
studies is of prediction# The work done on multivarxate missing 
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data problems and problem of growth curves are the two specific 
instances which show the importance of prediction in such 
studies- Some early work is due to Lee and Geisser (1972^1975)/ 
Rao (1975) and Ream (1975)- Since then it has received 
considerable attention of the researchers, for example see 
Rao (1977^1981), Geisser (I98l), Copas (1983), Reinsel (I984a,b,c^ 
1985), Rao and Boudreau (1985) and references therein- 


Suppose that N units are obseirved at p time points 
and observations are represented by p component vectors 
yj (i = 1,2, ---/N)# Q?wo types of prediction problems are 
considered in literature- For (N+l)th unit, vector 
observed at P 2 ('^ p) time points and we wish to predict the 
remaining P 2 C= p*^^^ values* This is called the conditional 
prediction- Another type of problan is of predicting the 
values of measurement at (p+h)th time point for these N units* 
This is called the problem of predicting the future observations* 


The problem of conditional prediction for the 
generalised growth curve model of Potthoff and Roy (1964) is 
treated analytically by Lee and Geisser (1972)- Conputational 
feasibility of predictors based on various ^proaches is 
discussed in detail by Lee and Geisser (1975)* This is 
illustrated nximerically by working out two exaitples- 

Rao (1975) and Ream (1975) also consider the problem 
of conditional prediction in RCR models when the parameters are 
known* Rao (1975) suggests the use of estimates of the parameters 
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when the parameters are unknown and obtains MSB of prediction 
when they are known. This will give an underestimate of actual 
MSB because it does not take into account the variability due 
to the use of estimates of the parameters. 

Reinsel (1984a) considers prediction in a multivariate 
random effects model when the random coefficients depend on 
some covariates» An expression for approximate prediction MSB, 
in cases whore estimates of unknovm parameters are used, is 
derived. Reinsel (1984b) considers this problem in a multivariate 
linear model with general dispersion matrix and obtains the 
prediction MSB of the usual least square predictor after using 
estimates in place of the parameters* He shows that the 
increase in prediction MSB due to use of estimates is substantial 
if the sairple size is small. This work is extended by Reinsel 
(19 84c) where he derives an ^proximate prediction MSB in the 
generalised multivariate linear model when q has a linear 
structure* Models considered by Rao (1965), Khatri (1966) and 
Grizzle and All^n (1969) are particxilar cases of this model* 

Rao (1977, 1981 ) considers the problem of prediction 
for future time points* Rao (1977) considers prediction in 
three situations viz*, l) prediction for a particTilar unit, 

2 ) prediction for a unit drawn from a specified population and 

3) simultaneous prediction for a number of units, in a linear 
model* The performance of various methods, such as the best 
linear unbiased predictors, ertpirical best linear predictor and 
the best homogeneous linear predictors, are coirpared by applying 
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them to the data of Elston and Grizzle (1962)# Rao (19 Si) 
considers the simultaneous prediction of future observations 
on a number of units in a polynomial growth curve model using 
their past values* 

Rao and Boudreau (1985) consider the problem of predlctlnc 
tuuure observauions in factor analytic model* In such models 
the prior knowledge of orthogonal functions is not assumed and 
both the orthogonal functions and factor variables have to be 
estimated* They also discuss a method of selecting an appropriate 
model# 

A model free data analytic approach to conditional 
prediction is considered by Geisser (1981)# A detailed study 
of issues involved in prediction and shrinkage in the regression 
models is considered by Copas (1933)# Subset selection in 
regression analysis is also considered# 

A potentially useful method of prediction is based on 
Kalman filters. Kalman filter is a recursive, unbiased least 
sc[uares estimator of a Gaussian random signal (Wegman, 19 83)- 
Since most of the work on Kalman fxlter, including the original 
wor> of Kalman (I960), is reported in engineering litarattore 
it is not very popular among statisticians# Harrison and 
Stevens (1971,1976) use Kalman filter for short term forecasting# I 
Duncan and Horn (1972) observe similarity between RCR model and | 
Kalman filter model# The recursive estimates occurring in the ' 

Kalman filter models are seen in the terms of regression theory* ; 
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Meinhold and Singpurwalla (1983) demonstrate that Kalman 
filters are very much similar to linear models* Connection 
between Kalman filters and Goldberger-Theil estimator is shown 
by Diderrich (1935)# Sallas and Harville (1981) use the Kalman 
filter technique for recursive estimation in mixed linear models# 

1.-^ MOTIVATION 

From the review of available literature it is clear 
that most of the researchers have assumed a prior Icnowlcdge 
about the degree of polynomial or the dimension of the regression 
parameters- Among some of the exceptions are the works of 
Kao (198l) and Lee and Tan (1984) for fixed parameter's case* 
Rao's (198l) objective function for selection of the degree 
of the polynoitd.al is the minimisation of the coirpound mean 
square prediction error (CMSPE), Lee and Tan (1984) consider 
this problem in a Potthoff-Roy model by a Bayesian m^hod and 
a method based on likelihood ratio test# Though RCR models 
are being used extensively, it seems, no work has been done on 
the selection of the degree of the polynomial (or the dimension 
of the dispersion matrix of random regression coefficients)* 

This is considered in Chapter II of the thesis* Prediction 
under the uncertainty about the degree of the polynomial in RCR 
model is also considered# The iitportance of co variate selection 
for covariance adjusted estimators is well recognised* The 
most widely used method of covariate selection is the one 
proposed by Rao (1967)# Grizzle and Allen (1969) have used the 
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method In the analysis of some data* This method is not based 
on any optimality criterion* Two strategies which are based on 
A- and D-optimality criteria used in design of experiments are 
proposed in this chapter* 

Strenio (I98l) and Strenio al* (1983) consider a 
growth curve model where they assume that the regression parameters 
are functions of fixed }cnown covariates* The mean has structure 
of Potthof f-Roy model and dispersion matrix has the structure 
of RCR model* This case has been generalised to multicharacteris- 
tic case by Reinsel (19 32# 19 84a) which is a very significant 
contribution in the analysis of longitudinal data* Por sinplicity 
the polynomials of the same degree have been fitted for all 
the characteristics# and the set of co variates is common for 
all the characteristics* This may not be the most parsimonious 
way of fitting the model* Moreover# there is a need to relax 
these conditions as will be illustrated by an exairple in 
preharvest forecasting* These considerations have motivated 
us for the work done in Chapter III* 

Regression models using environmental factors as 
indcapendent variables are widely used for forecasting crop yield* 
The regression coefficient of such a fit# in a way# r^resent 
the * state' of the environment which is not directly observable* 

It can be measured by observing the yield* Now# due to industrial 
activity and other reasons such as fumes and pollutants being 
thrown in the atmosphere# there may be deterioration in the 
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* state' of environment over the years# This change may be 
represented by an autoregressive process similar to that of 
Anderson (1978)# Thus there is a need to combine these two 
approaches simultaneously to form a two-stage model# In the 
first stage a usual regression model is considered and in the 
second stage the regression parameters are considered to have 
an autoregressive structure# Such a formulation is similar 
to Kalman filter type of models and is the topic of Chapter IV_# 

In most of the work on RCR models it is assumed that 
the eirrors are independently distributed or distributed as 
N(0/(J v) with a known for exanple see Rao (1975)# If V is 
a general unknown matrix then it may not be possible to estimate 
V because all the parameters may not be identifiable# If V 
has a suitable structure so that the parameters are identifiable 
then it may be possible to estimate them# In some cases it may 
be proper to assume that the errors of the same unit show an 
autoregressive pattern# Swcimy (1971# Section 4# 4) and Mansour 
et (1985) consider such structures# Swamy (1971) considers 

a stationary autoregressive process of order 1 but allows the 
autocorrelation coefficient to change from unit to unit# His 
analysis is valid if the length of the individual series is large 
which may not be a realistic assuirption in a longitudinal study* 
Mansour ^ (1985) consider a nonstationary autoregressive 

process but the diversion matrix of individual regression 
parameters is not very general# In Chapter V a mixed linear 
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model with the errors following a stationary autoregressive 
process of order 1 with the same autocorrelation coefficient 
for all the units is considered, 

1,4 EM ALGORITHM 

The concept EM algorithm is due to Derrpstor ^ al, (1977) , 

It is an iterative technique of obtaining the ML estimates. The 
algorithm is so named because it consists of two steps/ the 
expectation (E) step and the maximization (m) step. As such/ 
an algorithm is a special process of solving certain type of 
problems. The EM algorithm docs not actually specify the 
sequence of steps to be performed and so in the strict sense 
of the word it is not an algorithra but a procedure or a technique^ 

Some of the reasons for popularity of this technique 
among the statisticians are the following : 

1 , It achieves synthesis for ML estimation/ which is a very- 
satisfactory estimation procedure/ by exhibiting various 
methods as special cases of the general EM algorithm. 

Wide range of problems fall within periphery of this 
algorithm. Thus this is a remarkable procedure, 

2, The general EM theory, which shows that each iteration 
increases the likelihood, can be applied, 

3, The e 3 <pressions for defining iterative steps of the 
algorithm have meaningful statistical interpretations# 

Thus it is appealing. 
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In the EM setting the observed data are broadly seen as 
incomplete data coming from a larger system of quantities* 
ICnen we talk of complete and inconplste data we visualise two 
sample spaces (say 36 and ^ ) and a many-one function from 36. 
to ^ / i»a* for an observed y e ^ # x is not known completely# 
What is known is that x will lie in some region determined by 

cot 

the observed y (say (y) ) • Putting differently/ x is not 

CO 

observed directly but only through y# Ws say that x is 

ro CO 

conplata data and y is inconplete data# 

fo 

The family of complete data, say f(xl<p)/ is ass\imed to 

CO CO 

be known# The incomplete data specification family g(y|(p) is 

fs> fO 

derived so that 


g(yl 

€0 


= r f(xHp) dx, 

3t(y) ~ ~ ~ 

CO 


( 1 #>4# 1 ) 


EM algorithm aims at locating the value of g which maximizes 
g(yl(p) given the observed y# This maximization is achieved by 

CO CO fsj 

iteration and each iteration consists of two steps* the E-step 
and the M-step* 


Denpstsr, Laird and Rubin (1977) have shown that the EM 
algorithm can be applied to any family of distribution functions# 
But for the sake of simplicity in under stand tog the attention 
is restricted to the exponential family# Let f (xl <y‘) belong to 

CO fO 

a regular exponential family of distributions i#a# 


f (xl <p) = bCx) # exp [(p' (t(x) )] /a(<p) . 

CO CO CO ^ CO fO CO CO 
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It is well-kno\vn that t(x) is a complste sufficient statistic* 

ro CO 

The problem of EM algorithm may be stated as follows* We are 
given the family of complete data, an observed y and a many— one 
function from X to ^ * The aim is to obtain the value of 
that maximizes g(yl <p) subject to the condition 

CO CO 


The 


Let 

E-step 



bo the currant valu 


CO 

estimates the complete 


a of <P after p— iterations# 

CO 

sufficient statistic t(x) 


CO CO 

by finding the posterior expected value of the sufficiant 
stau^.stic, i*a.. 


t(p) 






The M— step simulates ML estimates which would have been arrived 
at if the conplate data were known# But the data are ineonplete 
and M-step can be carried out only in a proxy# The proxy is 
produced by E-step* The M-step obtains as a solution 

CO 

of 

E(t(x) 1<P) = t^p\ 

CO ro CO CO 


This is the usual form of normal aquations for obtaining ML 
estimates given the data from an exponential family of 
distributions* 


In general EM algorithm, no reference is made to 
exponential family* Denpster et (1977) have proved that 

a general EM algorithm produces a sequence that results in 
non— decreasing value likelihcxsd function and that a convergence 
is achieved# 
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Like any other method of numerical maximization the EM 
method can also converge to local maxima rather than the global 
maxima* The conditions for existence of unique maxima are well 
known. In the presence of local maxima the choice of initial 
value is very crucial* In cases considered in this uhesis the 
starting values are obtained by sinple estimators which have 
some desirable properties such as consistency and in some cases 
unbiasedness* Moreover, random checks are also given by other 
numerical maximizing algorithms and in all cases considered 
they converged to the same values as given by EM algorithm* 

^he time taken by EM to converge is much smaller in corrparison 
to other methods* Moreover, EM has certain other advantages 
such as meaningful statistical interpretation of each step and 
increasing nature of the likelihood at each iteration* 

Though the EM algorithm is a powerful conc^t for sinplifying 
the corrputations of ML estimates of the parameters there are 
some limitations* It locates the region of convergence very 
quickly but after that the rate of convergence is slow* It is 
not easy to get the variances of the estimates so obtained* 

Deipster ^ (1981) consider estimation via EM algorithm 

in linear variance corrponents models* Strenio (1981) uses this 
method for estimation of variances in hierarchical linear models* 
Laird and Ware (1932) use EM algorithm for estimation in mixed 


linear models* 



CHAPTER II 


PREDICTION UNDER UNCERTAINTY OP DEGREE OF POLYITOMIAL IN 

random coefficient regression models 

2,1 introduction 

Wliile fitting a poli/nomial to the data it is usually 
assumed that the degree of the polynomial is taiown* Misspecifi- 
cation in the case of fixed coefficient regression inodels has 
been considered in the literature by Hocking (1974) and 
references therein# The topic is also related to selection 
of variables for prediction on which a vast literature is 
available (Allen# 1971 and Hocking# 1976)# Some work on 
determining the degree of polynomial in growth curve inodels 
has also been reported in literature# For fixed parameter 
case# Rao (1981) studies this problem from the point of view 
of minimizing the conpound mean square prediction error (CMSPE), 
Lee and Tan (19 34) consider the problem of estimation of degree 
of polynomial in a general growth curve model by a Bayesian 
method and a method based on the likelihood ratio test# Some 
work on the factor analysis type of growth curve model has 
been reported by Rao and Boudreau (1985)# In this chapter# we 
consider the optimum choice of degree of polynomial and the 
problem of prediction under the uncertainty abovet the degree of 
the polynomial in random coefficient regression (RCR) models# 
Some strategies for the optimum choice of covariates are also 
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discussed* Some results of this chapter have been published 
in Oza and Shukla (1986)* 

The format of this chapter is as follows* In Section 2*2 
wo propose a RCR model and consider two situations which may 
arise du^j to uncertainty ab:5Ut the model* Section 2*3 gives 
the mv;an square error (MSE) of predictors for the two situations 
considered in Section 2*2 and Section 2.4 contains conparison 
of those errors* In Section 2*5 we consider the effect of 
using estimated parameters for prediction and corrpare the 
performance of optimum predictors with other similar predictors 
by a cross validation study on a real set of data* The pixjblem 
of selection of degree of polynomial is considered in Section 
2.6* Section 2*7 contains some results on optimum choice of 
covariates* General discussion and summary are given in 
Section 2*8* 

2,2 model 

Let us consider an observable p-vector random variable 
yj/ i = l/2,***/N, which can be expressed as 

ts> ^ 

yj = . + Cj t i 5= 1/2/ * * * / (2*2*1) 

* 0 "^ 

where X is a known design matrix of order p x q# which is same 

for all i/ 8.- is a q xl vector of random regression parameters 
*-> ^ 

and is an independent p x 1 error vector with mean zero and 
dispersion matrix a I* 

For random vector 3* we assxxme t,hat 
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3 . = 3 "^ + 7 ). , 1 = 1 , 2 , 

^ fsj fO 1- 


X7ith E(3j_) = 


q-q-i 


gi 

, 0 
to 


qi 


= 3 I var(3j) =var<il. ) = 

ro ro X fsjX 


q-q. 


p - - 

11 

^12 

^21 

^22 

^1 

q ~ q . 

( 2 - 

2 * 2 ) 


Partition X and 3^ as X = (X^rX^), where X^ and X^ ara p x 
and px (q-q- ) matrices, respectively, and 3. = (3i ,*31.)' where 

^ X CoXX |>0 X 

£li ^ (q-q^) X 1 vectors, respectively* Then 

the model (2»2*l) can be written as 


h = ^i£ii + ^2£?.i + Si 


= t ^ Si ^i / ^ * 1,2/*#*,N« 


iSli 21,2 i 

In this model the dimension of popxilation regression 
parameter 3- is smaller than for individual regression parameter 
3j's* Such a model has been considered by Pearn (19 77)* 

ro X 

In the present study we shall consider two situations 
which may arise due to uncertainty about the model* 

Case 1 ; Here one considers an incorrect dispersion matrix 

!i 

re 

^1 


of 3^ by assuming (2»2*2) as 

3 “ 




0 

fO 


' varjCPi) = 


F O 

11 


O 0 


(2* 2* 3 ) 


In other words, in this case one wrongly considers the dimension 
of individual regression parameters same as that of population 
regression parameters, i*e* q^* 
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2s Hers one considers an incorrect diinension of the 
population regression parameters by assuming (2*2*2) as 




; var, (J3 . ) 


(2*2*4) 


In this can 0 one v^rongly considers the dagrcic of population 
parameters to be same as that for individual regression 
parameters, i.e* q* 


In thv^ above cases and var^ , ^2 
expected values and variances correcponding to cases 1 and 2, 
respectively, while under the correct model they are denoted 
by E and var as given in (2»2.2)* 


2*3 PREDICTORS AND THEIR ERRORS 


2* 3*1 Case 1 : Let be the least square estimator of 
from the i-th individual i.e* = (X^ ^l^i* 

and are known, then the optimum linear predictors of 


•1^ 

t'3-j is t'P., (Rao, 1975) where 


=|ii - 'Hu - ^2.3.1) 

and = (X' x:^)*’^» 

The above estimator is also a Bayes estimator for an 

fo jLx 

appropriate lorlor* The minimum mean square error (MMSE) 
predictor for a future observation on (N+l)-th individual, 
given first p observations corresponding to a point 
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(belonging to row space of X^^)/ is given by 

3an+i - • 


(2.3»2) 

O 

Since the values of parameters o . and F ^ are 

coX XX 

not known we substitute their estimated values based on 
Yi/Yn/ ' '/y-KT* One might introduce a constant multiplier c 

-t CO 

in the second term of (2»3»2) when parameter values are 
replaced by their estimated values, as done by Rao (1975), 
and choose the value of c which minimizes MSE, In the present 
case, such an optimum c depends on unknown F, unlike that of 
Rao ( 1975 ), and has not been introduced here# However, 
introduction of such a constant c has been considered for model 
of case 2 in section 2# 5 where it does not d^end upon unknown 
parameters# There is some loss of information due to not using 
yjj^. for estimation of parameters# The expressions for 
prediction errors are diffic\iLt to obtain when information on 
(N+l)-th individual is also used* It is expected that the loss 
due to not including in estimation is not serious when N 

is moderately large# 


2 

We shall assume that on unbiased estimator of a , 
denoted by a ", is available# This may bo obtained by fitting 
a higher degree polynomial to individuals data and considering 
pooled residual sum of squares Wj^ distributed as 0^^(f)# 


When unknown we may estimate them by. 
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- N ^ 

N3. = E 13.. , 
i=l 


N 


(N-l) (P..+a^U ) = E (‘P, = B., 

Thtese ostirnr.-cors e-re the maximum likelihood estimators of 0. 

ro •* 

O 

eind respectively^ for the Case 1* Substituting 

those vjstimators in (2 = 3»2) we get a predictor of 

%+1-jf “ ^ol ^ ^ ' (2#3#3) 

If 71. is distributed as a rnu3.tivariato normal 
distribution then is distx'ibutod as w(B/N— where 

P = U^X'CXPX' + 

Wo assume that -w^ and are independently distributed# We 
also note that if B^ W(p;n— l,q^) then, 

E(B~^Bj^) = (N-2){ (N-q^-l)(N-q^-2)(N-q^-4)}‘“^P“^. 

Using the above results and noting that under the true model 

^N+1 5oi £i ;!3 n+i ' 

whore ^ = ^2ol*^o2^ belongs to row space of X, wo obtain 


where, for any integer k. 
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Nj^(k) = [ (f+2) (N^-l)(N-2) £ fN(N-k-l)(N-k-4)} ~^-2 ] , 


No(k) = (W--l)(N-k“2)“^, V= (z^.-X'X.U.z^J. 

^ «o02 2 1 ls,ol 

2*3*?' Case 2 ; Let ^ . be the least square estimator of ^ from 

«s> ^ CO 

the i-th individ-ual i«e* = (X'x)~^X''y. # The MMSE predictor 

ro- 

for a future observation on (N+l)-th individual, given first 
p observations corresponding to the point z , is given by 

^ti+i = SiC is+i - 


where U = (x'x)“-^. 


When 3 and P are unknown we estimate them by n 1? = S 3 ^ , where 

ro fsjt 1 

the summation extends over all N individuals, and 

/\ O W ^ ^ m, f 

(N-i)(p+a u) = S = B* 

4 A ro^fSS 


Again these estimators are the maximum likelihood estimators 
for the respective parameters for Case 2* 

Substituting these estimators in (2»3*5) we get a predictor 
of ^N+l 





(N-l) ] • 


Under the assurrption of multivariate normality 
B W(P + a^u; N-l,q) and thus we obtain 

= cr^+0^zluz +N\, (q)a^z^u(p+a^u) “^uz.. 

isiO fsj,0 X 1^,0 ^ o 


(2*3.6 ) 
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2,4 comparison op PREDICTION ERRORS 


We are interested in corrparing the efficiencies of 
predictors Up to this time we have derived 

expressions for MSB's for any N and for a general regression 
setup, Hovxever/ these expressions are complicated and to ge'-' 
some insight wo shall consider the case of orthogonal polynomia,! 
when N is large. Under this situation the expressions (2,3,4) 
and (2*3,6) simplify to 

liSE(y J = a^'+or^z' .z^--a^z'.(P, 

cvjOXcvXdX (sjOX XX X 2 22^02 




)= a ^-+0 ^ z' z _ -0 ^z' ( F+a I ) “^ 2 ^ . 

roOro^J foO 


It is not difficult to see after some manipulation that 




2 /T> P=-i] -1 

^ hil [^2.1=^+ ^ -2„ 

- ”2,1 22 J 


(2,4,1) 

2 ••I 

where ^ 2 •1*^21 matrix of regression coefficient 

,/v y\ /\ 4^ 

of 3 o,- on 3 -, where 3 , = ^0 1 conditional 

fsj2X I^,1X ^X roXl 

/N ^ 

dispersion of given by 

'^ 2.1 = ^22 + 
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The first matrix of (2»4*l) is always positive 
definite and the second matrix is indefinite and nothing can be 
said in general. However, when ^2*1 small and has 
large characteristic roots then the second matrix is likely 
to be positive semidefinite and will be a more efficient 

predictor. This is V7hat one would expect under this situation, 

to 

lu is of interest to consider the property of 
when the model of Case 2 is correct i»e. 


Yi = ki3 + X T). + e- , i = 1,2, 

to ^ to"*" 




In this model the degree of population and individual 
parameters are the same# The predictor is no more 

unbiased and this introduces a bias of amount This 

CO toZ 

CO N * 2 

will increase the MSE(yj^_^^^) by an additional term iv* ^ 2 ' 
in (2,3#4), However, the above conclusions are likely to hold 
even in this situation. 


Suppose one is interested in conparison of MSE^s for 
large value of N, in the neighbourhood of design points, as 
done by Allen (1971) and Hocking (1974)# It is not difficult 
to see that the difference between average MSE-'s, average taken 
over all design points, is given by 


+ Tr {E22(X5X2-X'X^(X'X^)“^X^X^)} ] 


# 
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whore 


^2.1 = X,_, G = X' aXj.j = X'_J OXj, 

and Q = XFX' -f a^I« 

Again wg see that when F-„ does not have small characteristic 

/i ^ 

roots then the last two terms of the above exprossion are 
positive and largo in coirparison to the first t-erm which is 
negative, thus mahing to bo a better predictor;* 

2,5 EFFECT OF ESTIFATION OF pARAItETERS ON PREDICTORO 


In this section we consider the problem of prediction 
for model (2*4#2)* The MMSE predictor for a future observation, 
having structure 


yN+i = Un+1 + sn+1 ' 

2 * lo 

when cr and F are known, is given by where 

~ a^u(F+a^u)”^(i^^ H3)* 

coN+1 ^N+1 ^ 

2 * 

Here wo shall consider a modified estimator when cr ,0 and F 
are estimated from the previous observations y-, yo/ • • -/Y n* 
Reinsel (1984b) has observed that there is a considerable 
increase in variance when parameters are to be estimated from 
a sairple* We modify predictor ^>7 introducing a constant 

c, as z'p„,. where 

* Cvb»s>N+l 


£n+ 1 =iN+l - , ‘2.5.1) 

the superscript e denotes errpirical Bayes type estimator* 
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Rao ( 1975 ) has obtained constant c so as to minimise 
® where summation extends over all N individuals^ 

In the present case xve find c so as to minimise the prediction 
error for any future observation where 

y\ 

= ^o£n+ 1* shown that 

E{yN+3_*-yN+i(c)} “ = a^+a^^Uz^+N2(c^'N^-2cN)0\^u(F+a^u)“^Uz^ 

(2* 5# 2) 

where 

N 3 = (p-q)CN-q- 2 )“^, = (Np-Nq+ 2) (N+1 ) (N-2 ){ (N-q-l) (N-q- 4 )}”^ 

Therefore^ the optimum value of c that minimizes (2»5»2) is 
given by c^ = Putting this value of in( 2*5*2) we 

obtain 

^(yN+i-yN+i(=»^ = 

( 2 * 5 * 3 ) 

The value given by Rao (1975) is C|^ = (N-q— 2) 

(Np-Nq+2) ^ which differs considerably from c^ for small values 
of N but for large N they are nearly equal* 

Exampl e : For enpirical comparison we consider the data 

provided by Strenio ^ ^* (1983) (see Experiment 1 at the end | 
of thesis) which is a modified version of Box (19 50). We fit ' 
a first degree pol^momial (qs2) for each rat* We omit the data 
on a rat and estimate the parameters from the observations at ; 
t = 1 ( 1)4 on other rats# For different values of c we predict j 



Tshly 2 t Observed snd Predicted Weights (in dm) and Mean 
Suuare DevLabion (HSIOat t=5 for Different 
Malue's of C 



Observed 


Predicted Weidhts 

Ra 1 

Ueldhts 

0 


C 

C* 




R 

O 


1 

176.00 

158.50 

158.64 

158.57 

158.76 

p 

174.00 

180.00 

173.08 

176.80 

167.69 

3 

201.00 

179.50 

170.37 

175.27 

163.26 

4 

157.00 

159.00 

156.36 

157.78 

154.30 

5 

148.00 

145.00 

148.69 

146.71 

151.56 

6 

152.00 

166.00 

153.73 

160.32 

144.18 

7 

156.00 

126.50 

139.24 

132.40 

149.15 

8 

154.00 

151.50 

164.97 

157.74 

175.45 

9 

138.00 

137.00 

140.49 

138.62 

143.21 

10 

177.00 

165.00 

168.61 

166.67 

171.41 


Msn 


203.50 172.22 172.33 240.76 
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the weight of omitted rat at p=5 given its weight at t=l(l)4# 
This is done for each of the lO rats# The predicted values 
are then conpared with known weight of the rat* The predictor 
corresponding to c = 0 does not take into account the 
inforraation provided by the ouher individuals while the one 
corresponding to c* = (N-i-l ) (1%)—Nq) ^ takes the estimated 
values as r.--al valuGs#Table 2*1 contains observed and predicted 
weights of rats at. t = 5 and their mean square deviation 
(MSD) = { Yj_~y^(c)} ^10, for c=0, Cj^/Cq and c*# 

The prediction error calculated by using expression 
(2# 5# 3 ) after substituting the estimated parameter values is 
195*29/ which compares reasonably well with the cross validation 
value of 172 #33# 

The difference between MSD^'s for c=C|^ and c^ is 
negligible# However/ using information on previous observations 
makes errors much smaller as indicated by MSD^ s corresponding 
to c=0 and c=Cj^ (or c^)# The high MSD corresponding to 
indicates that introduction of an optimum c is important when 
N is moderate# 

2.6 SELECTION OP DEGREE OF POLYNOMIAL 

Let y^.^ be the measurement at time t (t = l/2/».»/p) 
on individual i (i = 1,2 /##./N)# Assuming a polynomial RCR 
model/ we consider the selection of optimum degree of polynomial 
for the purpose of prediction. Without any loss of generality 
we can work with orthogonal polynomials. Suppose that response 
y^jj^ is expressed as 
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+ Sti ' t = 1 = 1,2,...,N C2.6.1) 

Vv’here ^(t) = (^f^Ct » , •, Sg('t))^ is a (s+l)x 1 vector 
of coefficients of orthogonal pol^^nomials* We assume that 

^^®ti^ = 0 , var(e^jj^) = cj^, cov(e^^/ e^^^) = 0 for t ^ u, 

cov(e, e ) = o for ± ^ j, eO-) = 3 , var(jS.) = E, 

UJ. ro J- 

and covO .^e.) = 0 for all i and 1 , 
where 

Under the model ( 2 # 6 *l), the least square estimator 
of £i is given by 



v/here 

=[®(l)/ W(2 ),<#../ W(p)]' is apx(s+l) matrix# We 
2 

know that when a and P are known the optimum linear predictor 

cv) 

Vv Vv 

of is g'P . where 3? is given by the e:5^ression similar to 

ro fs> (N»l ^ 

O 

(2*3#l)» When and P are unknown/ Rao (1975) substit\ites 

<S 3 

suitable constant multiplier of the following unbiased estimators, 
which are well known/ 

N 3^ = S ; ■ 

CO «N> } 

N p - ^ S .2n 

N(p-s~l) =[e E yh “ E e(3\ ) 1? ; 

(N~l)CF+a^l)„ =E0^ - 3*) (3^ - 3*)' = B,,; 

w «saX CO CO-i. CO 
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in the expression for P . to obtain the arpirical Bayes estimator 
of g'Pj as g'3?* Here is defined as 


£|=/,-c(s) .*2 (g^^-g*) 


(2-6^2) 


whero/ for any integer k. 


c(k) = H(p-s-l) (N~k-3)(Hp-Ns-N+2)“‘^- 


2»6*1 Choice of the degree of polynomial for extrapolation : 


Here we consider the optimum choice of degree of 
polynomial for predicting Yp^j^i “ 1/2 ,».»,n) simultaneously# 
The criterion used is conpound mean square prediction error 
(CMSPE)^ i»e. the total mean square error of prediction over 
all N individuals* 


If we choose only the first (d+l) terms in (2*6*1), when 
in fact the true model has (sfl) terms, then the minimum MSE 
predictor of is 


^p+li 




where HfCt) = (f(t)j , and =? ^^li ^ ^fi^^ have 

partitions of dimensions d+l and s-d, respectively* Using 
results of Rao (1975) it is not difficult to see that the 
optiinum value of constant to be substituted in (2*6*2) is c(d) 
and 




.-p.,y = 


Ncr^I 


c(d) 


a^(Pll+a^i)"^ 
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where == partitions of of dimensions dtl 

a.nd s~d, respectively, and P is partitioned as 



d+1 s*^ 


In this case the CMSPE is 
CMSPE(p.}-l) 

= + W(ptl)' [N0^i-c(d)a^(P^^+a^i)“'^ ]£(p+l)j_ 

t N £(p+l)' 

+ (p+l)£ [c(d)a^(P^j_+c;^l)“^P^2 ] 


Na^l-c(d)a‘^(Pjj+(7^'i)’"^ c(d)a^(Fjj_+a'‘i)"‘^F^2 

O W % 

ssNC +^^(p-hl) ro(p4*i) 

fO ^ 

L c(a)a=F3^(F^j«2i)-l N(F22+|,2g,p J 

( 2 *6 <• 3) 

In practice the minimization of (2 #6 # 3 ) over d cannot bo carried 
2 

out because 0 ,i3 and F are not known# We substitute unbiased 
estimators of quantities on right hand side of (2#6#3) to obtain 
an estimate of d# 

Rao (198l) has considered this problem in the fixed 
coefficient setup by using least square estimators# The case 
of Rao is a particular case of this \dien F is a null matrix 
and c(d) = 0# Under these conditions the expression (2*6# 3 ) 


reduces to 
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Na^Na^ ^(p+1)' W(p+i), + s kp+l)l 3 

€NJ ail. <0 X ^ <S} «!& C«0 


2i 


which agrees with equation (4*1.2) of Rao. 


2.6 » 2 Choice of d for 


Suppose our aim is to predict the value of response 
in the neighbourhood of time points Considered in the design. 

In such a case it may be appropriate to average the CMSPE over 
all design points as has been done by Allen (197l) and Hocking 
(1974). Since l(t)'s are coefficients of orthogonal polynomials 
it is not difficult to observe from (2.6*3) that the average 
CMSPE, average taken over all p point s^ is 


Average CMSPE = p“^[N(p+d+l )a^-c(d)cr^ Tr(F^j^+a^l)“VN(F22+t>2t>2^] 

(2.6.4) 

Again, in the absence of known values of parameters we substitute 
unknown parameters by their unbiased estimators. We obtain 
values of d by numerically minimizing (2.6.4). 

2*^*3 Example : For the purpose of Illustration we use the 

data on weights of 13 male mice measured at intervals of 3 days 
over 21 days as reported by Williams and Izenman (I98l) (see 
Experiment 2). We estimate the parameters by taking s = 4 and 
using data upto 18 days. The values of estimated CMSPE obtained 
by using ( 2 . 6 . 3 ) for extrapolation purposes at day 21 are reported 
in column ( 2 ) of Table 2.2. The estimated average CMSPE, to be 
used for the purpose of interpolation, obtained from (2.6.4) are 



Table 2 ♦ 2 J Compound Mean Snuare Predictiori Error(CMSPE) 
f'or Polynomials oF Different Decrees 


Desiree oF the 

CMPSE at 

Average 


Pol Pi iomi a1 

t=5 

CMPSE 

SSD 

( 1 ) 

(2) 

(3) 

<4> 

0 

0*7929 

0*7336 

1*8042 

1 

0*9170 

0*0215 

0*2169 

2 

0*3659 

0*0055 

0*0846* 

3 

0*1317* 

0*0048* 

0*1267 

4 

0*3429 

0*0054 

0*4355 


* denotes the minimum value 
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presented in column ( 3 ) of Table 2*2» The sum of square of 
deviations (SSD) of predictor based on (2*6»2) for day 21 and 
its observed value are reported in column ( 4 ) of Table 2 * 2 * 

If we had data upto IS days and wanted to predict the 
weights on 21st day then, looking at the values of CMSPE in 
coloinn ( 2 ) of Table 2*2, we would have fitted a third degree 
polynomial# Hov/cver, from column ( 4 ) of Table 2 #2/ we see that 
a quadratic is a better fit than a cubic# This small disagreeme 
may arise due to rea.sons such as inadequacy of the fit of RCR 
model and small amount of data available* For 
interpolation purposes a cubic rrodel should be fitted# 

2.7 SELECTION OF CO VARIATES 

In this section we consider the selection of covariates 
for covariance adjustment which are optimum in some sense# We 
follow procedure similar to that of Rao (196 7)# Rao's (196 7) 
procedure is as follows# 

Let T# and T„ be two vector statistics of order m and r 
such that E(T- ) = T and E(T«) = 0 where t is a m vector of 
unknown parameters# If cov(Tj^,T 2 ) ^ 0 then an estimator of 
which is better than T- can be obtained# Let 

roX 


var(T) s = var 

*^1 

= A = 

^11 

^12 

ro 

£2 


- ^21 

%2 - 


If we consider an estimator of x as 

CO 
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5* = - "12 h 

then it is not diffictilt to shovr that t-?*- has smallsr variance 

ro 

than If A is not knoxTO but an estimate of a / given by 


u 


u. 


11 


u 


12 


u. 


u. 


_ 21 '"22 _ 

is available, than it is natural to consider an estimator of 


as 


T- - T^. 


Rao (196?) has shown rhat if 


T 

ro 


N ^ ( 

m+r 


T 

<V> 

0 


, A) 


and 


U •^W(A^f,m+r) 


then 


E(t) = T } var(T) = [a 

»o ,o «o f^'r-l ‘- 


11 ^12 ^22 ^2ll * (2*7.1) 


The estimator T may not be necessarily better than Tj^_. Infact 

✓s 

T- is likely to be better than t when A-o is small and r is 

fsa X fv> X X 

large# In such cases it jnay be advantageous to choose a subset 

of T 2 which may be highly correlated with For a model 

with 


ECy) = XT J var(y) = XA X' + a^I ~ Q / say, 

ro csj ro 

where y is an observable p component random vector and X is a 
knox^^l pxm known design matrix, Rao (1967) recoiranends the use of 
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characteristic vectors corresponding to the dominant characterist:'. 
roots of [l ■“ XCx^x) J U as covaria-tes, where U is the 
estimated dispersion matrix of T, given by 


fU == 


(x'x)“^ x'sx(x'x)“^ (x'x)“^x'sz 

z^sx(x'x)"^ 


'7.^ Q'y. 


(2.7# 2) 


where Z is a p xr matrix of rank r such that Z-'X = 0, and 


S W(Q;f,p), 

This, however, is not based on any optimality criterion# One 
may seek the choice of cover iates based on trace minimization 
and generalised variance minimization criteria which are 
parallel to A-optimality and D-optimality criteria in design 
of experiments# 

2#7#1 Choice of covariates- for trace minimization criterion : 


Suppose one is interested in selecting a linear combinatio' 
of coirponents of so that the trace of the dii^ersion matrix 
of covariance adjusted estima-bor of T is minimum# Let £ i T„ 
be such a linear combination# Thai 






"^11 

^12 &>1 

£' T„ 

n*l ^ 

ro 

0 

^ Mi 

; 

hi 

^22 


We define a covariance adjusted estimator of t as 


T ^ ~ "^1 

roX 


^12 ti ^22 ii^ ii 


From expression similar to (2#7>*l) it follows that 
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E(tJ 


.r = I '' M £i'ri ''22 il ''21] 


For minimizing Tr [var(Tj_)] with respect to it is not 

difficult to see that the best strategy is to choose S, - as the 

<N>X 

vector corresponding to the largest root satisfying 


^ ^12 "■ ^22^ i. = 2' 

In the absence of the hnov/ledge of A we substitute its 
estimate U« 


2*7»2 Choice of covariates for generalized variance minimization 
criterion : 


In this subsection we look for the best strategy in 
terms of minimum generalised variance of covariance adjusted 
estimator^ say t of t • Let 


X2 


= “ 


U 2, 

12 ‘-a 


“22i2>"'‘i:2 


coZ 


be such a covariance adjusted estimator* It follows that 


E(T ) 


T, var(T2) _ ^ 22 ^ 2 ^ 


-1 


^2 ^21 1 * 


By observing that lvar(^)l can be written as 
ivar(L)i = (1 _ 


^2 “22 J ^2 


Zi Z 


it is not difficult to see that the best strategy is to take ^2 
as the vector corresponding to the largest root ^2 satisfying 

(^21 nil h2 - n22> ® = 2? 
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Again/ in the. absence of the knowledge of A / we substitute its 
estimate U# Here (p 2 is nothing but the square of canonical 
correlation between T. and T„, 

In both cases discussed above (sec* 2»7*1 and 2*7*2) more 
covariates can be chosen by considering the vectors corresponding 
to the next largest roots* 

2.7*3 Example : For the purpose of illustration we use the 
data of ramus height/ measured in mm/ on 20 individuals over 2 
years at half yearly intearvals/ as reported by Elston and 
Grizzle (1962) (see Experiment 3)* A linear curve (m st 2') is 
fitted to the data using coefficients of orthogonal polynomials^ 
The matrix S is taken as the matrix of sum of squares and 
cross-products corrected for mean with 19 (=f) degrees of freedom* 
Here r is taken as 2* The matrix U can be obtained by using 
(2*7*2) where X and Z are suitable matrices formed by using 
coefficients of orthogonal polynomials* 

Table 2*3 summarizes the results corresponding to three 
methods of choice of covariateS/ viz*/ Rao^s(R)/ trace 
minimization ( a) and generalised variance mlndmization(b) ♦ It 
may bo noted that the results for no covariates are the same 
by these three methods? as are for two ( all ) covariateS/ as they 
should be. The methods (a) and (b) are the best of the three/ 
in their respective sense/ when one covariate is used* However/ 
there is a slight increase in the average variance with the use 
of Covariates for these data* This shows that the correlations 



Table 2<,?{Avei'aj^e and Generalised Variances by Three Methods oF Cfjoice of Covariates 
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are not large enough so as to justify the use of the covarxates/ 
as for as the average variances are concerned# 

2# 8 DISCUSSION AND SUMMARY 

In most of the irork on the RCR models r^orted in the 
literature, it is assumed that the dimensions of population 
and individual regression parameters are the same# But this 
may not always be truG» Feam (1977) has considered a model 
wherein the dimension of the population regression parameters 
is smaller than for individual's* It is possible that the 
dimension of the popiilation regression parameters is smaller 
than for individual's but the analyst may wrongly proceed to 
analyse by assiming that they are the same# In case 1 we have 
studied the effect on prediction error when the dimension of 
individual's parameters is larger than what has been considered 
for analyses purposes# In case 2 the effect on prediction error 
has been studied when the dimension of population parameters 
is considered to be larger than the actual one# The judgen^nt 
regarding the relative efficiencies in these two situations 
should be based on ^2^1' matrix of regression coefficients 

of^lOn f and On the basis of expression (2#4#l) we 

can say that if has large characteristic roots then the MSE 
for case 2 is likely to be smaller than for the case 1# 

The problem of prediction for a new individual is 
considered in Section 2 #5# Rao (1975) has obtained the value of 
a shrinkage parameter c to be used when estimated values are 
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substituted in place of the parameters/ that minimizes 
S 2 £t(£i-£i)} , whore the summation extends over all individua.*’ 
A constant multiplier c which minimizes E 

a new individual has been obtained* It can be seen that these 
two values differ considerably for small N but for large N they 
are nearly equal, as expected* 

The assuirption regarding the knc'./ledge about the degree 
of the polynomial is seldom likely to hold in practice* A more 
realistic situation is one in which the analyst has to decide 
the degree based on the data* This problem in the setup of RCR 
model has been considered in Section 2*6* The choice of the 
degree can be made on the basis of values of estimated CMSPE 
[equation (2*6*3)] or average CMSPE [equation (2*6*4) ] as the 
case may be* 

Another iirportant aspect in the analysis of polynomial 
models is the selection of covariatos for C30variance adjustment 
to obtain more efficient estimators* This problem has been 
considered by Rao (1965,1967) and Grizzle and Allen (1969)* 

Rao (196?) has suggested a method ^ich is not based on any 
optimality criterion* Two methods are proposed in Section 2*7 
which provide the best strategy for the choice of co variates in 
terms of minimizing the average variance and the generalised 
variance , respectively, of the covariance adjusted estimator* 



chapter III 


ESTII'iATION Al® PREDICTION IN A MULTIVARIATE RANDOM COEPPICIENT 
REGRESSION MODEL WHEN DESIGNS ARE DIPPERENT POR DIPPERENT 

CHARACTERIST ICS 

3,1 INTRODUCTION 

The RCR model proposed by Rao (196 5) has received wide 
attention of the researchers working iji the area of growth 
curve analysis. Rao (196 5,196 7,1975), Eisk (1967) and Swamy 
(19 71) have laid a strong theoretical foundation for studies 
in RCR models. Since then, this model has been generalised in 
various directions# Por example, Strenio ^ al. (19 83) extend 
the work by introducing co variates in the model# In a series 
of papers Reinsel (1982, 19 84a, 1985) generalises the growth 
curve and the RCR models to a multivariate case# He inposes 
certain restrictions on the model such as the response for 
each characteristic be ^proximated by polyncsnials of the same 
degree and the set of covariates be common for all the 
characteristics# This leads to a considerable sinplif ication 
in the results# However, there are situations vAiere it may 
not be ^propriate to assume the same degree of polynomial for 
all the characteristics# One such situation has been described 
in the followings 

Let us consider a problem of predicting the yield of a 
crop which is available at the time of harvest #., ^l^^^ecords on 
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growth characters/ such as height/ diameter and leaf width/ may 
be available at various time points* There is no reason why 
pol];Taomials of the same degree shoxild be good fit to all the 
characteristics* I^reover/ some variates that may be covariates 
for height may not be good co variates for, say, leaf width* Thus 
there is a need to generalise the model of Reinsel (19 84a) to 
allow for different degrees of polynomial for different 
characteristics* 

In Section 3*2 we propose a generalised multivariate RCR 
model and obtain the maximum likelihood (ML) estimators, which 
are iterative, for the parameters of the model* In Section 3*3 
expression for approximate mean square error (MSE) of the 
predictors, based on the ML estimators, has been derived# A 
non iterative estimation procadtire for a particular case of the 
model, which may provide initial values for the ML procedure, 
is suggested in Section 3*4* A Monte Carlo simulation study 
to compare the performance of the predictors based on the 
ML estimators and some noniterative estimators is described 
in Section 3*5* The resxilts of this study are discussed in 
Section 3*6* In Section 3»7 the results of a worked o^It 
example are given# Section 3*8 contains discussion and 
summary of the work done in this chapter# 

3,2 MODEL AND ESTIMATION OP PARAMETERS 

Let us assume that there are N individuals, and m 
characteristics are observed on each individual# The 
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jth (j = characteristic is observed at pj occasions 

and its response can be ^proximated by a polynomial of 
degree in time* Let 

^ + Sij ' i = 1 J = 1- •••/”>? (3.2.1) 

where y,. . is an observable p .— vector random variable corresponding 

J 

to the observations on the j— th characteristic of the i— th 
individual; A . is a known p .x q . design matrix, and ti.- ^ is a 

J J J 

q. xl column vector of random rearession coefficients corresponding 
J 

to the j-th characteristic of the i— th individual* We assume 
that 


E(e, .l7r_. J = 0 ; var(e. .Iji . .) « a? I. 


We shall view 7i. • as a random parameter vector related 
to covariates as 

ZXI = ^Ij ii ^ ^ = J = (3*3-2) 

where X. . is a q,. x d . matrix of known covariates and r. is 
a dj -vector of fixed parameters* 

The equations (3»2#l) and (3*2*2) can be written as 


y. = AjIj + ej f 714 = X. r+ T). f i = (3*2*3) 

where 

Jl ” ! ^ = 'Slag(Aj,...,Ajj)f * 

Xi = aiag(X^3^,...,X^);i, = (Ji ■^yf gi = 
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1 

ro-** 4 (sj 


and 


= ° I = V = diag{o2lp^,...,4i^ ); 


m p. 


m 


E(r) ir) = 0 } var( 7 ].ir) 


'rs> X fo 


Note that y. and tt.- are vectors of order p( = EP') and 

ro*J“ J 

q( = E q j ) / respectively* 


Now we may write (3*2.3) 


as 


where 


y^ = Ax.r + u-. 

X foX 


= A T} . + 

foX <s,l 


with 


E(u. ) =0 ; var(uj) = AFA' + V = n . say# 

CN> X (SJ <«oX ac w j. 

When are jointly normally distributed, the likelihood 

function is 

, - 4 n , 

L(r,<,j,Flyj,,,.,j^j)« lol .expC-^S^Q u^), 

where U4 = y . - AX. r 

coX isX X csi 

After some matrix maniptilation this may be written as 
2 loq lCt ,CT # A I Yi / • • •/ 

(fSJ fs^X 4-0 

=: const#-N loglVl-N log lAI+N log lUl- SuJV*'^u^+ su^V~^AUA'v"'^ u^ 

- s u'.v“Wa"'^ua'v“^u,, 

«oX fs^x 
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where U = (A'v“^A)“^ ; A = F + u* 


The maximum likelihood estimators of 7 , a and c? 

are obtained 

ro ^ J 


1 = (2xJa”^x^)~^ SXJ ^ ; 

(3-2#4) 

A = N“'^,UA'v“^(Sr,.rOv”^AU ; 

(3-2#5) 

= {N(pj-q^)}“^..Tr[S^.{I-A.(A'Aj)“^A'}] ,1 < j < 

m; (3# 2 #6) 


where 


31 . = (A'A) 
ro-*- 


“1 


A'y, 




r . . = y . .—A .X. ■ 7 . 

coaj jiij j ij 


r. = y.—AX. 7 « 

^ ^ ir. -• 


An estimator of F may be obtained as F = A - U* For the sake 

of simplicity we have maximized the likelihood with respect to 

2 

parameters Y, Oj and A* Consequently, this does not give ML 
estimators of the original parameters and, in fact, estimate 
of F may fail to be positive definite in some cases* Rao (1975) 
and Reinsel (19 84a) have also used these estimators and they are 
noniterative in their situations# 

The aquations (3»2*4) and (3* 2# 5) are solved iteratively 
, till the convergence is achieved# Note that the right hand side 
of (3#2#6) can be written as 


^2 2 

and the estimator Cj of Cj is noniterative# 

A particxilar case of interest, which introduce a 
considerable siirtplif ication, is when all Aj's are same and 
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all the characteristics have common covariates/ as considered 
by Reinsel (19 84a) • For this case the equations (3 *2 #4) and 
(3 *2 *5) arc noniterative* The estimator of y is given by 


r = (SX<X.) 


-1 


SX'. 71.- 

X coX 


The estimators are the same as obtained by Reinsel (1984a) • 


Since the variance conponent estimates obtained by the 
ML method are even and translation invariant functions of 

/s. 

observations/ it can be seen that t is an unbiased estimator of 

y (Kachar and HarvillS/ 19 81)* Its asynptotic variance is 

(Sx^A X^) » It is not difficult to see that A is a consistent 

estimator of A • Moreover, N(p.-q.)o?<« on N(p.-q.) degrees 

J J 3 J J 1 

of freedom* 

3*3 PREDICTORS AND THEIR ERRORS 


It may be of considerable Interest to predict the S 2 

components ^2 Yq ^ individual, given its 

components y - and information on auxiliary variable X • A 
(PoO X o 

similar problon of prediction is considered by Peam (1975), 
Lee and Geisser (1975), Rao (1975) and Reinsel (1984 a,b) 
and others* 


The minimum mean square error predictor of ^ 2 / 9 iven 
^ol' given by 

where V and A are partitioned as V = diag(Vj,V 2 ) and 


(3.3*l) 
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Corresponding to partitions y - and y ^ of y / 
respectively.# The mean square prediction error (MSPE) matrix 
of (3 ^3^1) Is given by 


® C(2o2-^2^(S=2-5p2>'] 


^2*1' 


(3*3*2) 


Using some matrix identitios, the egruatlon ( 3*3*1 ) can be written 
as 

202 = -^02 [io - ' (3-3.3) 

where 

io = «' ' 

provided all the necessary inverses caxist# A particular case of 
great interest, where ^.nd invertible/ 

is discussed in details in subsection 3*3*1* 

2 

Now suppose that (j = l/2,**,/m) and P are Icnown 
and we are required to estimate y from the sairple# In this 

fN> 

case a natural predictor of y^,, is obtained by substituting r 
in place of y in (3*3*3)/ i*e* 


? o 2 = ■^02 


( 3 * 3 * 4 ) 


Prediction error increases due to uncertainty about y by a 
quantity 


®*ho2“Io2>'io2"2o2’'5 = Hfvar(r)!H' , 


CN> 
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where 

Thus the MSE matrix of (3.3,4) is 

®"2o2"|32^‘X=2"2o2^'* = (3.3.5) 

The predictor of / when all the parameters have to he 
estimated, frorn tho sairple/ is given by 


(3 3,6) 

/S /\ A . A A 

An estimator of P+U^ is given by [\ — U+U^ , where U and are 


obtained by substituting o j in place of in U and 
respectively. It is not difficxilt to see that so obtained 

is always positive definite. This gives 


So 2-2?2 = ^o2(h(wp-l-u^CP+Uj)-l!(l-xJ;). 


Using the approximation of Appendix 3*1 it is not 
difficult to observe that 


when N is large. Moreover (see i^pendix 3,l) 


E£(v 


,0 2 





~ A^2[N“l£U^(P+U3^) 


)'! 

^(F+U)Q(F+U) (F+Uj_) 



+ Tr{ (F+u)q}U^(F+Uj)“^(F+U)(P+U^)“^j}+P+U^(F+U^)“^(T-p) 

+ (T-p)(F+Uj)“\+U^(P+U^)“^D(P+U^)"\] A'2 ? 


n ,* 


( 3# 3*7 ) 
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where Q,P,T and D are defined in "the i^pendix 3*1* This gives 
the total mean square error of (3«3»6) as 


Ef (702-^0* » 


r<P Z 


^ *1 4 


+ Tr{ ( F +U )q } U^ ( F+U^ ) “■" ( p+u ) ( F+U^ ) } +P+U^ ( F+U^^ ) ~^ ( T-P ) 

+ (T-p)(F-1~U^}"^Uj+U^(P+Uj)*“*D(F+Uj)“%^] . (3-3.3) 

3.3*1 a pplication to preharvest forecasting : Consider the 

problem of preharvest forecasting- Measurements on biometric 

characters may be available over a period of time but the yield 

is available only at the time of harvest* Since yield is one 

of the m characteristics/ the matrix A . is of the form 

ol 

= (A^^lo) and so A',A ^ is not invertibLe. The derivation 
ol 11 ol ol 

of prediction variance when A' ^Aq^ is not invertible but ^^I'^ll 
is invertible is considered in the following : 

Without any loss of generality, let us consider the 
case when S 2 == 3 i.e# y^^^ is a scalar and A^^ = * 

The minimum mean square error predictor ( 3 ?^ 2 ^ ^02' lol' 

and its variance are given by expressions (3*3.1) and (3.3*2)/ 
respectively. The predictor of y ^2 when only y has to be 
estimated from the sample is given by 


fs 3 


^o2 


== ^ 2^0 


r 

fO 




-1 


(y_^^*-A y) 
ho 1 o 1 o ,0 


= ^ 2^0 




- X 


bl 


^ X 

y )? 
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where 


71 


pi 




u, 


11 = 




and F and are partitioned accordingly as 


P 





<» 

/ 





m 


It is not difficult to see that the increase in MSE due to 
uncertainty about r is 

^^^ 02 -^ 03 ^^' = Cvar(2))H^ ; 

where 

»1 = ' 

The predictor of ^2 when all the parameters have to be 
estimated from the sairple is given by 



“ 2d 2^0 


X+i2i(%Ai>"^<a,r^oi:> 


Proceeding in the manner similar to what has been used in 
deriving (3#3»7) it follows that 


®‘yo2' 


■^02^ 


N“^[Tr(Qji(Fjj+Ujj) )][(!', 




-1 


F 

^2 


where 


So the ^proximate prediction mean square error is obtained as 

+N ^ [TrCQjjCFji+Ujj^))] [S'22+®m"^21^hl'*'hl^ ^12^ 

f ) 
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3*4 A NONITERATIVE ESTIMATION PROCEDURE 


A particular case of interest is when . = I ^ x'. . and 

-uj 

d . 5= r - X qj / which means that E(jr. .) = X. . y can be expressed 
as ^ = r where F consists of corrponents of 7 so that 

a coluran vector formed by stacking the rows of r is y • In this 
section we propose sirrple noniterative estimators of T and A 

ro 

for the above case* This estimator is similar to that of 
Reinsel (.1934a) for the case he has considered* 

The following are unbiased estimators of 7 and A 

rs> 

I = ; 

G» n =5:(ii-Xif)Ui-x^ IV = (say)f 
where 

) 

where is a submatrix of order gj- x such that 


n. = (A'A)*“'^A'y. / 
= = 


h L' ' 






* denotes the Hadamard product of two matrices* 

Assuming y. and ji. to be jointly normally distributed, 
the conditional distribution of given yj_ is normal with mean 

2 i = ^ " ''"^‘21 " 

' O 

When 7 , A and a. s are assumed to be known, ji. is the minimum 
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MSE estimator of with MSE matrix given by 




'•'I'- \ 

fvj«L roX 
2 

When Y/ A and are unknown and are to be estimated from ihr 

data then it is natural to consider the estimator of 71 ■ as 


given by 


<v> 

71 


W.(G^) 


Tlj^ - C, U A - X. Y ), 

ro*^ JL foX 1 fo 


where constant c^^ is chosen in such a way that 

E [s(^(c^)-Kjl^) (^(Cj)-^i) ] is minimized* It is very difficult 
to obtain this expected value* Instead, we consider the 


estimator of u. as 7 lf(c), given by 

roX roX 


7i?(c) = tt^-c u sr-^CTT^-x,. r), 

ro**- f\ to JL Jm CO 


-1/ 


(3-4*1) 


It is still difficult to obtain the expression for 
[i(c)-7ij ) (jm I 

simplification is achieved if we assume that is ^proximatel 


E [ S(7i^(c)-7ij_) ( 7 i^(c)- 7 ij_)'] • But a considerable airount of 


distributed as W(A/N-r/q), where r = m SEr-j^P* In the above 


k £ 


'k£' 


r is taken as the average of s» In the case considered by 
Reinsel (19 84a) all the are the same and the distribution 

is exact* When r is not an integ^ir, an integer nearest to it 
can be taken- Under this ^proximation it is not difficult to 
see that 

2 

E [S(jIi(c)-7Ii)(7rf(c)-7Ij|^)'^ ^NU -[ g" 




N-r-g-1 


- 2c] U U* 



The optimum value of c is (N-r-g-l), giving the minimum value 
of E [S{7lf(o)-JtwC7t?(o)-^,)'] 


= NU - (N-r-q-l) 


U A“^ U 


= NU - c U U (3.4.2) 

If we do iioc introduce a constant multiplier in ( 3 . 4 .I) but 
only siibstxtute uhe estiraates of the parameters then the estimato 
of 71 4 v^ill become 


5i 


(N-.?)u s:^ Cn. - X. r) 

A co<i 2 . ro 


71 f (N-r). 

roX 


( 3 . 4 . 3 ) 


3.5 MONTE carlo STUDY 


To compare the performance of the noniterative esti- 
mators with ML estimators a Monte Carlo simulation is performed.. 
The performance of predictors based on these estimators and 
several others is also studied. 

The data considered have two characteristics (m = 2 )/ 
one measured at five time points (pj^=5) and the other at four 
tirre points (p2=4)# The individual growth curves are linear 
for both the characteristics (^ 1 =^ 0 '=^^/ i.e. Ti-t and are 
vectors of dimension 2. At the second stage 7 ^ has covariates 
as ^ = (1/Zj_)/Z^ taking values -1,0 and 1 same number of 
times# For 7 I 2 "the covariate is Z 2 taking values 1 and -2 in the 
ratio 2:1. The dimension of r is 6. The model of Section 3.2 
with the following specifications is considered. 



6 4 


r' = (50,3,11,2,3,1,2); 


X ^ 

0 0 1 
0 0 0 


0 0 0 


0 0 0 1 


0 0 
0 Zj 0 



0 0 


11111 
J 2 3 4 5 





1 1 1 .1 

12 3 4 


/A — di3.g' 


The errors at stage two are chosen in such a way that 
the actual variance of i = 1#2/ explains a predetermined 

percent of the final variance of ji. # The errors at the first 

roi 

2 2 

stage, i*a* and are chosen such that (yj.JAj|^), i =: 1,2, 
explains, on an average, a predetermined percent of the final 
variance of y-# 

The data are generated in a manner described above* 

For generating error vectors the subroutine GGNRM of ISML, which 
generates multivariate normal random vectors with 0 mean and 
specified dispersion matrix, is used# We compare two types of 
estimators viz#, the ML estimators (Section 3»2) and the non- 
iterative estimators (Section 3--4) # To study the performance of 
the predictors based on estimates of the parameters obtained 
by these two methods, data for sixth time point of the first 
characteristic, i#e# for = (l/6,0,o) are generated# We 
consider the conditional prediction, i*e* prediction at ^o2 given 
the observed values at all the time points corresponding to the 
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design matrix A, The following five predictors are considered 


for scrutiny# 


( i) 

A^ ^ r . ; 


( ii) 

!i '• 


( iii) 

- 


( iv) 

71 ^ (N-r) 

w <# fsjU- 

« 

i 

(v) 

^02 [ii - 

A (71. - X. r ) ] . 

Predictor 

(i) is based 

on least sguare estimator of individual 


jTj and (ii) is the minimim MSE predictor obtained by putting 
the known values of the parameters# Predictor (iii) is obtained 
by using (3»4#l) whereas (iv) is obtained by using (3#4#3)# 

The predictor (v) is based on ML estimates of the parameters* 

The prediction MSE of (i) and (ii) are given by 

2 

( ^ ®2#1' respectively# The prediction MSE of 

(iii) and (iv) can be obtained by using (3*4#2)/ and that of (v) 
by using (3*3#9)# The estimated values of the prediction M SE 
are obtained by substituting suitable estimates in place of the 
parameters# 

Pour situations are considered for simulation study# 

In the first two cases we consider a moderate sample size of 21 
and in the other two cases a large sample size of 90 is 
considered# For each sample size two situations are considered; 
in one the percent of variability explained is 90 at both the 
stages and in the other this level is 60 percent* For each of 
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■these cases# 100 runs are carried ou't# These "values are reported, 
in Tables 3»1 - 3.3- 

3*6 RESULTS OP MONTE CARLO STUDY 

^ ^ e stimation of parameters : It is evident from 

tne Table 3*1 that the estimates of r are fairly satisfactory for 
ootn tne me'thods^ The coirponentx^ise mean square deviation - (MSD) 
over 10 0 siimalation runs for the noniterative method is smaller 
than that for tne ML method in all the cases except for the 
th3.rd conponent of t when sanple size is 90» However, overall 
they are not very different* Por the same percent of variability 
explained/ the larger sanple size produces smaller componentwise 
MSD/ Por a given sample size# MSD decreases as the percent of 
explained variability increases* Both the results are on the 
expected lines* 

2 2 

Table 3*2 contains the estimates of P. a ^ and a • The 

1 2 ‘ 

2 2 

estimates of cf^ and are the same for both the methods and 
so there is nothing to conpare* The estimated values are close 
to the population values showing that the estimates are 
satisfactory* 

It is difficult to compare the estimated value of P 
obtained by these two methods* For sanple of size 21, all -the 
components of the mean estimated P obtained by the noniterative 
method are nearer (in the sense of absolute difference) to the 
true F than those by ML me-thod* However, for the sanples of 
size 90 the position is not clear* The estimate of F^^, 
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Table 3.3JMe3n Souare nevi 3 tion<MSD) and Mean Sauare Error(MSE) for Predictors <i) to (v) 
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dispersion matrix of the regression parameters for the first 
characteristic, obtained by the noniterative method is nearer 
to its population value than the one obtained by the ML method* 

But for ^2 2 ' dispersion matrix corresponding to the 

regression parameters for the second characteristic, the ML 
method gives the estimate that is nearer to its population 
value than by the noniterative method* Nothing of this sort is 
evident for that for small sizes the non- 

iterative method behaves more satisfactorily than the ML method* 

For large sample size, however, the ML method improves the 
estimate of F considerably* This is as ej^pected because the ML 
method generally does better for large sample sizes* 

3*6*2 Results on -predictors : Results on prediction are 
sunimarized in Table 3*3* The comparison of predictors is based 
on MSD* In all the four cases the minimxm MSB predictor with 
known parameters (ii) performs the best (i*e* has the smallest 
MSD)* The predictor based on the noniterative estimates without 
introducing a constant multiplier (iv) is better than the predictor 
based on individual ji. (i). The introduction of constant 
multiplier c (iii) gives better results than Civ), as one will 
expect, in three cases* But for the sample of size 21 and 90 
percent variability explained, wo see that the introduction of 
c increases the MSD value* This is the case ■^ere even (i) 
performs better than (iii)* This observation does not fit in 
to the general pattern* For sample of size 21 the predictor based 
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on the ML estLTiates has the highest MSD, However, for sample 
of size 90 this predictor does quite well as one will expect 
because estimates are satisfactory for large sample size. 

In uhis Ccise, its MSB is smaller than for predictor ( iv) but 
slightly largvjr than (iii). 

Looking at the Table 3*3 we observe that larger sample 
size produces bh® ratio of the MSE and MSB vjhich is nearer to 
unity than for smaller sample size for predictors (ii), (iii) 
and (v). Irrespective of sample size this ratio is close to 
one for predictors (i) and (ii)^ For the predictor (v) this 
ratio is far from one for small sample size but is near one for 
large sairple size* This is as expected since the expression 
(3*3*8) gives MSE in the asymptotic case# The behaviour of the 
predictor (iii) is similar to that of (v)# 

3-7 illustrative EXAMPLE 

In this section we explain the method by working out 
an example- The data are taken from a survey conducted by 
Indian Council of Agricultural Research on preharvest forecast 
of yield of sugarcane in Meerut district in India in the year 
1976-77 (see Experiment 4)* The data consist of measurements 
of cane heights (in meters) and diameters (in centimeters) of 
randomly selected plants from 33 plots of 25 squares meters 
each* Five records are available on height and diameter fretn 
each sampled plant# Final yield of canes, giving the total 
weight of sugarcane, and number of shoots for each plot are also 
available,# 
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A second degree polynomial in time appears to give 
an aaoquats fit for the height* A second degree polynomial 
for the diameter is also found adequate* However, the diameter t 
do -'jOt show significant correlation with yield and arc not 
considered in great detail for the present analysis. For 
sugarcane yield, number of shoots per plot is considered as 
a covariato* For height no covariatas are considered. For 
tho i— th plot y^j^ is a 5x1 vector corresponding to height at 

ro 

5 rime points, gives final yield of tho plot and gives 

the number of shoots in the i*-th plot. In terms of the above 
notation 


hi 


^1 ^ ^12 = ^ ® i 2 ' 


where k-th row of A^ is (l,k,k'); 




’!i = *' = "2 ’ ’'“‘Si’ “ 


The comprehensive model is given by 

^ 2i + ' 


where 


5 

Aj 

o" 



3 ' 

“ I 

0 “ 

A =s 



! 

Xi = 

1 

0' 


1' 

0' 

1 : 






3 1 3 . 2 
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The maximum likelihood estimates of r, n and oj ana 
Obtained by iteration and convergence does not create any 

special problem. In this case o\ is not estimable explicitly. 

The estiuiates are 

/s 

= ^0* 38,0. 49, -0,0 4, 17, 10,0, 25) ; 

“ 0,531 

-0,339 0,266 

^ .-.o 

var(r) = 10 ''x I '0.046 -0,337 0.005 

1,753 0,330 0,055 1216,834 

_ 0.000 0,000 0,000 -9,303 0.080 

0,192 

-0,112 0,088 

0.015 -0.012 0.002 

0.581 -0.109 0.013 42.099 

No test is made to check the adequacy of the above model- 
A visual inspection of simple covariance matrix of heights and 
that obtained from the above fitted model do not show much 

A. 

discrepancy. Inspection of A suggests that second and third 
consonants of can be considered as fixed parameters. 

The above growth curve (G.C.) Model was used for 
conditional prediction. The yield of the i-th plot is 
predicted using heights upto time t (t =* l/.,.,5) and is 
denoted by The value of MSD. given by N. S(y 2 i“yjfci^ 

where summation extends over all values of i^ has been calculate^ 
We also evaluate corresponding value using expression (3.3.9) 
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and conpare it with the MSD value (see Table 3 , 4 ) • 

For comparison purposes another simple predictor of 
obtained by regressing height at each stage and number of 
shoouS/ is calculated# Such predictors based on multiple 
regression (i'I#R*) method are in current use# They, however/ 
do not include ciny information on previous stage because of 
complicat ons due to coll inear ity# Growth curve approach for 
prediction has many other advantages over M#R» approach# For 
the latter one has to develop different prediction formulae 
for different stages# Moreover# equations developed for one 
set of points arc not suitable for predictions whan data are 
not recorded exactly at those time points# However, the 
prediction from G,C, approach is free from such limitations# 

For these data there does not appear to be any systematic 
decrease in the prediction error as data on more time points 
become available# This is also true with M#R# method# This 
is perhaps due to small size of N in conparison to the number 
of parameters estimated# When all the auxiliary information 
due to heights is ignored then the MSD is 42# 10 which shows 
that heights do not reduce the MSD considerably in this case# 

When diameters are also taken into account then the MSD 
for G#C# method is 30#71 while thatfor M#R# is 37# 88, 
corresponding to the fifth stage. M#R# method takes into account 
the height and diameter data of single stage and number of 
shoots# When data on all the stages are taken into account. 
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TabJe Mran Stsuarcs Error of Predictors for Multiple 


Redress ion (MR) arid Growth Curve (GO Methods* 


Siade 

Msn 


MSD 

MSE 


(MR) 


(GO 

(GO 

3 

3G,63 


37*33 

43*29 

4 

36 ♦ S15 


35*70 

43*16 

5 

30*64 


37*25 

42*91 
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the HSD of M,R, method is considerably reduced to 16 * 62 * This, 
hov/evcr, cannot be relied upon as the number of parameters is 
12 from 33 observations* 

G,C, method may be preferable over M,R, method for the 
reasons given aoove, though for the present data there is not 
much to cnoose between them on the basis of MSD, 

3*8 DISCUSSION AND SUMI^IARY 

In this chapter we discuss a generalised RCR model which 
allows different design matrix and different dimension of 
regression parameters for each characteristic* The random 
regression coefficients for each characteristic may also depend 
upon different set of auxiliary variables* This generalises 
the results of Reinsel (1984a) in many directions* Such models 
have potential applications in many situations* A possible 
application in preharvest forecasting has been considered in 
great details* Working of the proposed method is illustrated 
by using real data in Section 3*7* 

The maximxim likelihood procedure is adopted for estimation 
of parameters* The estimates are obtained by iteration* Optimum 
properties of ML estimators have also been discussed* Some 
simple noniterative and unbiased estimators/ which may provide 
starting values for iteration, are given in Section 3*4* The 
results of simulation study indicate that for estimation of y , 
the noniterative and ML estimators are close to the true values 
and both the mothods are satisfactory* For small sample size the 
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noniterative method provides more satisfactory estimate of P 
than the ML method. For large sairple size, however, the ML 
method pr.cduces much improved estimate of P as can be expected. 

Problems of conditional prediction are discussed in 
detail in Section 3.3. An asymptotic expression for MSE of 
prodiCL-ioii (see equation (3#3*8)) is derived. On the basis of 
s imulat io’i study carried out in Section 3.5 we find that this 
approximation is satisfactory for large sample size. In 
siraulation study the performance of various predictors, including 
tho empirical Bayes type predictor and predictor based on ML 
estimates, are also compared* Por small sarrple size the performance 
of the empjjrical Bayes type predictor is superior to that of 
the one based on ML estimates. Por large sample size both are 
close competitors. 

Reinsel (1982,1984a) has considered this model with the 
same design matrix for all individuals and same covariates for 
all the characteristics. He has obtained least square estimator 
of r which is also ML estimator and is noniterative in nature. 

The noniterative estimators considered in Section 3.4 are similar 
to the least square estimators of Reinsel. Prom the above 
discussion it appears that this method has a clear advantage 
over ML mothod for the parameters considered here, ^art from 
the simplicity in computation, atlcast for moderate sample size. 

The studies on simulation experiments are not exhaustive. 
Moreover, the simulation results are based on a very small nxiiriber 



78 


of experiments* The i 

ne study IS taken only to confirm the 

theoretical results and in mo<?t o-f -f-v, 

n most of the cases considered 

the simulation r'=‘sulte! t •. . , 

n r-suits agree well with what is expected 

theoretical grounds* 


here- 
on the 



appendix 3.1 


Derivation of axprassion (3.3^7) 


^ A 


Zo 2 ~S 2 - - u,(p+u,)-i) (; -X 


0 o 


Dow v/o use the following approximation, 

(P+Uj) (P+Uj) ^ = Uj (P+U^) (P+u^) “^(p+u ) (P+u ) 


tCUj (P+u^) -^^ty^CP^u^) -Xj J J 

~ f ^i-^l “^(P+Ui) } (P+Uj)'""^ 


-i 


%2 -^o 2? %2 -Y^o 2> '1 




- '\)2 ^ ^X’+Uj) (P+Uj^) } Q { (P+Uj^) “^(P+Uj) } ^ ] A ' 2 

“ "^02 -ECUjQCP+U^) (F+U^) ^Uj}-E{Uj(P+U^)"^(F+Uj)qU^j 

+ E£ (P+Uj) (P+U^) Q(P+Uj) (P+Uj) ] 
where 


A' 

“^>2 


Q » (P+Uj) ^ [(P+Uj)+X^{var(y)} X'](F+Uj)*x ^ (Q_) 
Next, 


A A 


E(UiQU^) = U^QU^+P 


(±) 


wnere 


diag(g2Pj,..»,gj^Pj^) J gj = 2{N(pj-qj.)} 
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Her^ Ajj J.s the design matrix eorrespondlng tc the observed pert, 
of j—tii characteristic jn v 

^ol' 


A Q A t i; 


[a QA + Tr(AQ) A ] + (U^-U) QA+ Aq(Uj~U) 


+ (Uj~U)Q(u^-U) + D 


F+Uj) q(P+Uj)+H 1 [i\Qi\ +Tr(AQ) a] + D (ii) 


wnerci 


D = [ (A'Aj) -1 - (A' A) -1 ] Q [ (A^A,) ‘I - (a'A) "1 ] . 

E [U^Q(F+U^)] = e[UjQA+ UjQ(Uj-U)] 

= U^QA + UjQ(Uj-U)+P-a? 

= UiQ(F+Uj)+P^ M<<> 

where 

Using (i) , (li) and (iii) we obtain 

E [(v -V* ) (v -v^ )'l 

£or lo2^ J 

- (P+U^) AQA(F+U^) "^Uj+Tr{ AQ} U^ (P+U^) A (P+U^) 

+ P~U^(P+Uj) "^P-PCF+Uj^) ~^Uj+Uj(f+Uj_) "^t+Kp+u^) 

+ Uj, (F+Uj) "^DCP+Uj) "^U^ ] A'2# 



CHAPTER IV 


A TVJO-STAGE AUTOREGRESSIVE MODEL FOR REPEATED I4EASUREMENTS 
4,1 INTRODUCTION 

Most of the economic time series analysis is confined to 
asymptotic x'esults based on a single large series of observations. 
However, in the study of biological and physical processes over 
time, it is moro likely that data on replicated series over a 
relatively small interval of time may be available. It is of 
interest to estimate the parameters assuming that they remain 
same over units. A vast literature on growth curve analysis, 
wherein it is assumed that the parameters do not change over 
time, is available; for recent reviews see Ware (1983) and 
Khatri (1985). However, in some situations it may be more 
appropriate to consider that the underlying parameters change 
over time in some regular fashion# Anderson (1978) has 
considered a multivariate autoregressive process for repeated 
series and obtained the ML estimators of the parameters# He 
has also developed procedures to test the nature of autoregressive 

paromoter matrix. 

Prom a practical point of view, there is a need to 
generalise Anderson^s (1978) single stage model to a two-stage 
model# To have some insight into situations where such 
generalisations may be required, consider the problem of 
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predicting yield of perennial trees* The yield of a tree may 
depend on some auxiliary variables such as age, weather condition 
and may be expressed by a regression relationship for each years 
Moroovnr, the regression coofficients for each tree may change 
from year tu year in a regular fashion due to process of aging, 
which may bo represented by a multivariate autoregressive 
process* In this chapter we have considered such a two-stage 
model for repeated measurements* 

The two— stage model that xats discuss here is similar to 
the well ]<nown Kalman filter model, e.g* see Kalman (1960), 
Duncan and Horn (1972), Sallas and Harville (1981), Meinhold 
and Singpurwalla (1983) and West ^ (1985)* In Kalman 

filter models there is only one series of observations and the 
transition matrix is assimed to be toiown* This may be a valid 
assumption for processes governed by physical laws but for 
biological processes the transition matrix may not be Known* 
Availability of a number of series enables one to estimate the 
transition matrix as in Anderson (1978)* Other related worK 
with random coefficient time varying parameters in the context 
of time series is reported by Rosenberg ( 197 ^ a, b)* Swaray and 
Mehta (1977), Swamy and Tinsley (1981) and references therein. 

In section 4.2 we propose a two-staga model. An EM 
algorithm for the calculation of maxtarai llkallhood (ML) 
estimates of the model parameters is derived m Section 4-3. A 
noniterative method of estimation, which may provide initial 
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values for ML procedure, is suggested to Section 4.4- Section 
deals V7itn the problem of prediction# Section 4#6 describes a 
Monte Carlo simulation study. The results of simulation study 
are d.'.scussed in Section 4»7* An illustrative exairple is 
presented in Section 4.3» The discussion and siimmary are given 


4 -.'^ 


in Section 4*9# 


4,2 MODEL 


Consider a two“stage model in x-jhich at the first •-stage 
an observable xl random vector for the nth individual 
at time t is expressed by a regression relationship 


^oct “ ^at £o£>t ^t t ^ t = »•#!'? 

(4#2#1) 

where }^,j.is a PQr-t ^ design matrix; is a qx 1 vector of 

random regression parameters/ is a p^xl error vector* 

oct 

We assume that 




0 








2 . . 

where 6 is the Kronecker delta and a is a positive scalar? 
^at %'t' xincorrelated for all a, a' and for all t, t'* 

At the second stage we assume that follows a first- 
order Markov process with zero mean and may be expressed as 


' “ = 1.2,-.. A t = 2,3,. ..,TJ (4-2.2) 

where G is a q x q matrix and is a sequence of q x 1 

independent (unobservable) random vectors with 
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where V is a symmotric positive definite matrix. 


We also assurae that is distributed normally as N(0,Rj) , When 
characteristic roots of G are less than one in absolute value 
then the. process is stationary. 

In thfei above formulation it has been assumed that mean 
® zero. In practice the mean mav not be zero and it 
may have to be estimated from the data. Some modifications 
are requ:3.red in the estimation procedure, when mean is nonzero 
and arc given in Section 4,3,2, 

Anderson (l978) has considered only a single stage model 
corresponding to relationship (4,2,2) but making G and V 
depending upon time as G^. and The two-stage extension of 

this model is similar to Kalman's filter models. In Kalman's 
filter model G.^. and Vj. are assxamed to be known which may be a 
valid assumption for the processes governed by physical laws 
but in biological processes G.^, and Vj. may have to be estimated 
from the data. In this study we have consider^ the case when 
and remain same over time as well as over replications. 
Availability of number of replications enables one to estimate 
G and V even for a relatively short series/ as in the case of 
Anderson (l973) » 


4,3 ESTIMATION 

T 

Define a E vector y^ and a T.q vector as 

tsl 
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Assuming errors and to be normally distributed, the 

conditional probability density of y„ , given ^ , can be 

<roCC 

written as 




and the marginal density of 3 as 

injCG 


(4*3^1) 


- N(0,A), 


(4*3»2) 


whore 

where U is a lower triangular matrix with elements 

U = (Uj_j) = (G^ ^), i > j and = diag{R^,V,#,«,V) # 
Combining C4*3#l) and (4#3»2) and integrating over 13^“' s, the 

CoCIr 

likelihood of parameters, given Y*s, a = after 

some manipulations, can be written as 

N , 

L « n fCy^lRi,v,a^,G) 
a=l ^ 

^ a=t 

X exp t - f (o"^ y^Jo - la (4.3.3) 


a.rsl 


where 
N 


p » S Pq,! * A’"^+a‘”^ diag(x'3_ W* 


a=sl 

Sa = Za‘ 
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lihood IS a oomplicatea function of parameters 
and the Mt solutions t, differentiation are not eas. to ottain 

An ct C 2.V0 ITlwilllOCi T.rh-*/-iU n 

not require the use of 

likelihood function as such, is v^a fm -au r 

t as vaa Ei4 algorithm (see Dempster 

jJ:*/ 1977) and is dsscr-ib?=H 4 -t, 

uwScriJDsd in the next subsection* 

by ai alrrorlfnm s To describe the steps 
Of the El, algorithm for the model of Section 4.2, rewrite the 
equations (4,3.1) and ( 4 , 3 , 2 ) as 


\nx^+ah x^A 




(4*3*4) 


After has been observed inference about , when A and 

are given, is characterised by the conditional distribution of 
g^’ven , i.c* 




where 


= ''5C'(x„a xJ+a^D-l y , 

fo 

= n-AX'CX^AX'«2i)-l x„A 


l4»3.5) 


(4*3*6) 


If £q^'s were observed we would then estimate G,R^,v and 
follows : 
g’®" = vecCG"^) 




t «2 a 


t =2 a 


oat «^t-l 


(4*3*7) 
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T 


N(T«1)V'' = E E )0 w 

t=2 a «vcct-l ^„at !:n;-fc-i‘' ' 


'«at 


(4,3.3) 


nr: 


Tpa^'' 


S 


a 






C 4. 3.9) 
(4.3.10) 


v.Thore voc(A) is a colujim vactor formed by stacking the rows 
.,jf Af a.iQ ^ is a Kronecker product of matrices. ThasQ 
estimates cannot be forme<3. because are not Imown-# 

€*oCX 

^'St £ be a vector of unknown parameters. The EM 
algorithm forms as a solution of (4.3.7) - (4.3.10) 

using the e^qsected values of the moment matrices of 0 . ^s, 

«^JCXt 

conditional on 6 and the observed data# It is not difficult 

<ro 

to see that 


^ ' ‘Jat-Wat’ ' Ja ' 


(oct at&at 


•at""at a,tt^ 


where ^a.,ij a qxg matrix and is a 

suitable vector component of 

Now all the terms necessary for formxng the conditional 
expected value and the EM algorithm are conpletely defined. The 
estimation step is given by (4,3.5) and (4-'3.6) and the conditiona"* 
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expected values of the terns or. the right hand side of (4,3.7) 
(4.3ao) provMe the maximization step. 

^ i ? . ^i-£ . j-gations when mean is not zero ; We suggest 
modifications to be made when the mean of is different 
from zero* Consider a general mixed model of the form 

.^at ^ \t&xt’^^t ' 

£at “ *^^,t-l'^I]at * 

In such a setup one has to estimate additional parameters fi from 
the data# After y„ has been observed dLnference about 3 . when 

^ «sjCX 

A / d ‘ and ^ are given# is characterised by the conditional 
distribution of i3„ given y^ . i#e. 


~ •'‘la 'V 


where 


L = " 'Za - 


(4-# 3«ll) 


= Kx 


n 

If were known G,R^,V,o’ and ju. are estimated as the followl; 

<sgt CX^ X 


g 

fO 


g = vec(G) 

T -1 ^ 

]*{ 

t =2 a “ t =2 a 


(I ® ( r s ^ 2: (gcct® 


N(T-i)v= S r C|af^£at-l> ‘gat-®£at-l> ' 

ts =2 OC 

^ £al?Al ' 


(4# 3* 13) 
(4» 3# 14) 
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^ t.=l a ^ ~ ' (4.3.15' 

»g =r £(z;v"' 2'(y^ -X^)!. 


Tho conditional axpaoted va.l.uos of the terras on the right 
hand sido of oguatlons (4,3.12) - (4.3.16) can be obtained 

without much difficulty* 


^*3*3 G o.mments..on the proparbies of the estimators ; As we 
are dealing with multivariate normal distribution^ usual 
assunptions for the I'^IL estimators to have optimum properties 
are satisfied* Thus the estimators are consistent for large 
N* However, for fixed N, and T tending to infinite, one may 
have to inpose condition of statinarity and taiowledge of 
for proving the consistency of the estimators# As for as 
the information matrix is concerned one may have to obtain 
it by numerical differentiation as analytical expressions are 
very conplicated* 

In the present problem the possibility of process 
converging to a local maximum cannot be ruled out* Under these 
circumstancesthe choice of starting value can be very crucial# 
If one starts with initial estimators which are consistent, 
then under certain usual assunptions of the maximum likelihood 
this process is botind to converge to estimators which are 
consistent and asymptotically efficient (Lehmann, 1933, 

Chapter 6)* In the next section we describe a method for 
obtaining consistent estimators which can be taken as initial 
values for the algorithm* 
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4.4 A NONITERATIVE METHOD OP ESTIMATION 


In the above section we have considered the maximum 
likelihood estimation procedure which is iterative# To initiate 
the iteration initial guess values of the parameters are 
required# In this section we describe a method of obtaining 
noniterative estimators/ vrhich may provide good guess values 
of the parameters, when all are equal to, say, X and 

(= PQ#sa-y) ^ q* We also discuss their properties# In 
this case reduces to X.;f. = diag(X,X, . * # ,X) and correspondingly 
to C# 


The likelihood of parameters aiven y^'s is given by 
(4# 3# 3 )# After some matrix manipulation it is not difficult 
to see that (4# 3# 3) can equivalently be written as 


( 27 icr“) 



X IRj 



X 

SQC^^)] 




« L J X 


where 

I,, = x;y^. 

Prom this it is clear that Tj = S Q(£a) ^ JoEa 

sufficient statistics. Given the distribution of 
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^ has a x distribution on degrees of freedom 

and is distributed independently of 3 • Prom the nature of 
the likelihood it is evident that, givoi parameters in A 

can be estimated by T 2 only* An \anbiased estimator of < 7 ^ dkt 
obtained as 


ss S Q(Pq^) pHlg* 

For structured matrix A ^ which is a function of G,R^ and V, 
the estimators of parameters cannot be obtained in a closed 
forirUi Even the differentiation with respect to the parameters 
G,Rj and V is not easy and do not result in equations that may 
be easy to solve# However, considering only it can be 
seen that 

, &xp [ - i a ^ Tr{ ( l+c7^P' (x'x^) "*^P) ( E i ] 


where P is such that A**^ a PP''# It can be seCTi that 

' « [kl^l^- Sa2-^al’ ^ 

In cases where all'diO latent roots of less 

than ■unity In absolute value, we may ejqjand. 

[l+cr^P'Cx'X )'’^P]**^ in the powers of P'(X'X ) ^P» The 

^ * 41 * 

dominating terra in the exponent term of L2 
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which IS tho same as in the ej^ression of likelihood in 
Anderson (1978) with replaced by The estimators of 

Rj/G and V can be obtained by expressions similar to (3*7)/ 
(3*12) and (3#13) of Anderson (1978), respectively, i.e. 


^Ic ” ^1^°^ ^ 

= C(l) , C*(0)‘"^ * 


T 


N(T^l)v^ ^ ^£at“Gcgat-i) (iat<W ' > 


where 


T 


(T-l)C(j) = E C. (j) • 
t=2 
T 

(T~l)C-'''-(j)= E C^-lCO) J 
t=2 


A 

EO . y = w 

fsjOuX- 


(4-4#l) 
( 4* 4» 2) 


Prom bhe definition of i3„ it follows that 
> = 2» = "S: + O^CX'X)-!, 


Now it can be seen that 


E[Cj(0)] « Rj+C7^(X'X)'*^ I 
e[C(1)] G(R3^+R2+.#«+Rj^j)-^ 

E[C’^(0) ] = [ Rj+R2+#*#+R,j„j+(T“1)cJ^(X'X)‘"^ ] • 

T-1 

The above expressions suggest that R^ and can be 

estimated unbiasedly by and respectively, where 
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_ c^(o) - a^(x'x) = c*(0) - (a?-i) a^Cx'x)”^, 

Howovar, in some cases both and may fail to be positive 
definite. A better estimator of G can thus be obtained by 
replacing C^^O) by in (4.4.1), The estimator is given by 

G = C(l) 

By using law of large numbers it is easy to show that for fixed 
T, C(3) and tend to their expected values i.e. as N ^ oo 


C(l) G (Rj^+. » »+R p^j) 5 

-2 > (R^4-...+Rp^^) . 

Thus 0 gives a consistent estimator of G. However, for finite 
may turn out to be nearly singular in some cases, and in 
such cases it may be advisable to modify it so as to make it 
irrecertible. 


4.5 


PREDICTION 


Suppose y- has been observed upto time T and one wishes 
to predict the values of ® knotvn then 

the minimum mean square error predictor is just the conditional 
mean, given by 

^aT+h * ^+h ^ & ' 

where is the vector of last q components of gj = ^(£^.1^^), 

given in (4.3*5). 
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The prediction variance is 

h-l 

S 

i=0 

Since G is not known its estimate is substituted to give the 
predicted value of as 


+ oV'"'] (4.S.1) 


ExT+h = 


It is not oasy to evaluate the prediction variance of 
^aT+h* variance evaluated by substituting 

the estimated values for parameter values in (4*5,l) will be 
an underestimate because it does not take into account the 
variability due to using estimated value of G# 


4.6 MONTE CARLO SIMULATION STUDY 

To compare the performance of the simple noniterative 
estimators and the ML estimators/ a Monte Carlo simulation 
study is performed. The data are generated in the following 
manner# 


The data vectors of dimension 4 (i*e» = 4 for all a 

and t) are generated for T=4# The regression relationship 
at the first stage is considered to be linear Ci#e# q = 2) # 
The design matrices taken to be the same for all 

a and t as 



The ^ 's are considered to have been generated from a bivariate 
normal distribution with mean 0 and dispersion matrix R^i/given by 
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R 


1 


0.40 

0.00 


0.00 


0.40 - 




Two cases are considered for second stage model viz., 
the stationary and the non station ary# In the stationary case 
we have R^=R 2 =* . . =^^=R (say)* This inposes the condition that 
the cnar ace eristic roots of G be less than 1 in absolute value 
and V satisfy the conditions that V = R-GRG'. The autoregressive 
matrix G is taken to be 


G = 


giving V as 


V = 


0,60 0,00 

0.00 0 , 70 _ 

0.256 0.000 ■ 

0.000 0,204 


which is symmejtric and positive definite. For the nonstationary 
case G is taken as 


G = 


1.20 0,00 

0.00 1,32 J 


Two different values of V viz,/ 


V 


’0,256 0.000 
0.000 0.204 


and 


“0,16 0.00 “] 

_0,00 0.16 J, 


are chosen to enable coitparison over V. Moreover, the value 


V 


0. 256 0,000 
0,000 0,204 
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is the same as for the stationary case so that the comparison 
over the values of G can be made* 

Tne cjnount of variability at the first stage (a^) is 
chosen in such a way that, on the average, (y^^lX^^) explains 
a predetermined percent of the total error* "’por the stationary 
case four situations are considered} a moderate sample size of 
30 ana a large sample size of 95, each combining with two levels 
(60% ard 90%) of variability explained at the first stage* For 
the nonstationary case we restrict the studv to 90% variability 
explained at the first stage* Tv/o values of V and two sanple 
sizes of 30 and 95 give foup combinations of parameters* 

For generating random normal vectors the subroutine 
GGNRM of IMSL, which generates multivariate normal random vectors 
with 0 mean and specified dispersion matrix, is used* We 
conpare two types of estimates viz*, the ML estimates via EM 
algorithm (Section 4*3) and the noniterative estimates (Section 
4*4) t on the basis of outcome of 100 simulation runs* These are 
reported In Tables 4*1 4*2* 

4*7 RESULTS OP SIMULATION STUDV 

The results of Monte Carlo simulation over 100 runs are 

summarised in Tables 4*1 "■ 4*2# Perusal at these tables reveals 

2 

that for estimation of parameters G,Rj^,V and o , both the 
procedures are satisfactory for stationary , as well as nonstationar' 
cases* There was no need for modification of M 2 as it was always 
well behaved for the purposes of inversion* It is observed that 


Methods the Noniterative(NI ) end (ii)the Mejtimum 
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on the basis of MSD there is no indication that one method is 
uniformly better than the other^ The noniterative procedure 
is corrputationary very siirple and provides very good estimates* 

In fact/ noniterative estimators give more satisfactorv results 
than the ML procedure particularly when sanple size is small* 

These results are very much on expected lines* The EM algorithm 
locates the region of convergence very quickly but later on the 
convergence of this procedure is rather slow# 

For stationary case it is observed that larger percent 
of variability explained by the model produces smaller MSD values* 
Same is the case with sanple size i#e# larger sanple size 
produces smaller MSD, Both the observations follow expected 
patterns# 

For nonstationary case we do not study the effect of 
changing the percent of variability explained by the model# 
Instead# we study the variation due to change in V* No systematic 
change in MSD is observed by making V smaller# In this case 
also the larger sanple size yields smaller MSD as one will 
expect# 

4# 8 AN EXAMPLE 

To get some insight into the performance of ^ 

predictor of we illustrate the method suggested Ih this 

chapter by a small example. The data reported by Rao (1931). 
which are originally due to williams and Isenman (1981) (see 
Experiment 2) , are used. The data are not adequate for the 
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present model but have been used for illustration purposes only.^ 
The data consist of weights of I 3 mice measured at intervals of 
3 days over 21 days beginning from the third day^ Only a 
Single measurement at each time is made giving p^^ = 1 for all 
a and t# We consider a mixed model of the form 

^at = , a=:l,,.,^13t t=l,2,..«,6: 

which may be written as 

y t = (l/2t-7)M + (l/2t-7)l3„. + e„. . 

P^^'s are assumed to be related as 

^at “ ®^t“l 2at* 

Parameters are estimated from the data upto 18 days^ These 
values are used to predict the weight of each mouse on 21st day^ 
The predicted weights are coitpared with the corresponding observed 
weight of each mouse# 

The maximum likelihood estimates are obtained as follows: 
jU « (0#5793# 0#0654)' # = 0#5481 xio”"^ f 


“ 1*2779 

-1.8779" 

-1 

1 R- = 10 -^X 

" 0.2671 

0*0674” 

^0#1160 

0.5088 _ 

1 

_ 0.0674 

0.0176 _ 


^ ^ 0#1009 0#0217 

V = 10 X 

^ 0#0217 0#0049 i # 

The EM algorithm locates the region of convergence rather 
quickly but later on the rate of convergence is very slow* 
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To get some idea how good the above dispersion structure 
fits -Che data, we have performed the lilcelihood ratio test. When 
there is no restriction on the dispersion matrix, the maximum 
likelihood estimators are given by 

II = (Z's ^Z) ^Z'S ^ y I var(y) = [S+N(y-z|.) (y-Z^)] /N, 

where S is the 6x6 corrected sum of squares and product matrix 
and y is the 6x1 sample mean vector^ Number of parameters 
fitted in the unconstrained case is 23 and corresponding 
maximum value of the loglikelihood is 138.9376. For the above 
structured model 13 parameters are fitted giving the maximum 
value of the loglikelihood as 131.5795. This gives likelihood 
ratio test statistic 

-2 log A = 12.7162 , 

on 10 degrees of freedom, showing that it is not significant. 
Hence, it appears that this model gives a reasonable fito The 
number of observations is too small and it is difficult to 
put much faith on the sensitivity of this test in the present 
case* 

The estimated values are used to predict the weights 
of mice on 21st day (h=l). These values are then conpared 
with the known observed weights on 21st day. The sum of 

«v> -2 

squares of deviations, given by S'yiT+h"^iT+h^ ' where 

and y^T^j^ are observed and predicted weights of the ith mouse 

at (T+-h)th time pointi respectively, is 0.0381* This value is 
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considarably smaller than the corresponding value obtained by 
using best linear unbiased predictor^ James-Stein regression 
predictor, simple regression predictor, eitpirial Bayes predictor. 
These values are reported by Rao (19 81) and the minimum value 
is 0, 0836 for empirical Bayes predictor# The value reported 
in Rao (1981) is 0*0951 which is in som.e error due to omission 
of some factor (Personal communication with C#R# Rao) • 

The prediction variance obtained from (4*5*1) comes out 
to be 0*0335 whicn is considerably larger than the empirical 
mean sguare error value which is 0*0029* This inflated value 
of prediction variance may be due to the fact that many 
parameters have been estimated from a small data and hence the 
estimates may not be precise* 

4*9 DISCUSSION AND SUMMARY 

In this chapter, a two-stage model is considered in 
Section 4*2 wherein the first stage is the usual regression 
model and at the second stage the regression parameters of the 
first stage are assumed to follow autoregressive process* These 
models have possible applications in prediction in perennial 
crops and other biological processes# The resulting model is 
similar to the well known Kalman filter model* In the latter 
model only a single series is observed and consequently a number 
of parameters cannot be estimated and are assumed to be known* 

In the present work it is assumed that replicated series are 
available enabling us to estimate all the parameters* 
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For estimation of the parameters, the ML procedxire 
via EM algorithm is adopted to Section 4,3. Some simple 
alternate estimators, which are consistent, are considered in 
Section 4»4. Fenformances of these two methods are studied 
on the basis of Monte Carlo simulations.. It appears that the 
proposed alternative estimators perform very well when sanple 
size is small# Even for large saitple size their performance 
is satisfactory* Moreover, confutation ally they are much 
simpler in cjrrparison to ML estimates. 

Ihe proposed model has been used for prediction purposes 
on a real set of data and results indicate that the model may 
have some potential use* However, a lot more additional work 
needs to be done before this model could be accepted as a 
potential alternative to the existing methods for the analysis 
of the longitudinal data. For example, it is very necessary 
to obtain the variances of estimators. A derivation for the 
MSE of prediction of predictors based on estimated values of 
the parameters or atleast some approximation thereof should be 
available. Some test for goodness of fit of the model to the 
data should also be available. 

The number of simulation runs for all the cases is 100 
which may not be sufficient for making assertive commenta. The 
study is not exhaustive either. Our aim is merely to see the 
behaviour of various estimators and in most of the cases the 
observed nature of these estimators follow the lines one would 
expect from theoretical considerations* 



CHAPTER V 


estimation and prediction in mixed models with 

AUTOCORRELATED ERRORS 


5*1 INTRODUCTION 

A gensral family of mixed linear models proposed by 
Harville (1976) , which includes growth and repeated measurement 
models, has been extensively studied in literature, for exanple 
see Harville (1976,1977), Denpster ^ (1931), Laird and 

Ware (1982) and Kackar and Harville (1984) • In most of the 
work related to growth and repeated measurement models it is 
assumed that the errors are distributed as multivariate normal 
with mean 0 and dispersion matrix a^l or a^v with a known matrix 
V. The assumption that the dispersion matrix of errors is a I 
may, in some cases, be too restrictive. If V is a general 
unknown matrix then it may not be possible to estimate V because 
all the parameters may not be identifiable. It is of 
considerable interest to consider a case ^en the dispersion 
matrix of errors is of autoregressive nature in addirio 
30.. coofficionts .ando.. So.e researchers have oons^erod 

such models, for exanple see Swamy C1971) and Mansour gt al. 

(1985) • 

A mixed linear model with autocorrelated errors is 
proposed in Section 5.2 which may be more appropr 
longitudinal studies. In Section 5.3 we obtain the — 
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likelihood (ML) estimators of the parameters which exe iterative 
in nature. The problem of prediction and derivation of its 
moan square error (MSE) exe considered in Section 5.4. Some 
sample ostimators, which provide good guess values to initiate 
the iteration, aro obtained in Section 5.5, Properties of 
these estimators are also discussed. The description of Monte 
carlo simulation is given in Section 5.6, Tho results of 
simulcTtion study are summarized in Section 5.7. An illustrative 
example is given in Section 5.8. The work done in this chapter 
is briefly reviewod in Section 5.9. 

5, 2 MODEL 

Let us consider a mixed effect regression model given by 

yi = Xii+ , i = 1,2/...,N f (5.2.1) 

where is a pxl vector of observable random variables? X and 
are known design matrices of order pxm and pxq, respectively/ 
ju is a mxl vector of \inobservable fixed effects, 0. is a qxl 
vector of unobservable individual random effects with E(£^) =r o 
and varOj) « P and is a pxl vector of errors. 

CaX 

The above mixed model includes growth and repeated 
measurement inodels# For exairple, the random coefficient regression 
model considered by Rao (1965,1967,1975) can be treated as 
particular cases of this model. The situation considered by 
Peam (1977), where the dimensions of fixed and random effects 
may be different, falls within the ptirviow of the model 
represented by (5.2.1). It is possible to consider a more 
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general version of this model where X and differ from unit 
to unit. This type of generalised model is considered by 
Laird and Ware (19S2) and Ware and Berkey (1984), Swamy (1971)# 
Fearn (1975) and Rao (1975) consider this type ox generalisation 
for =X and X being different for different units^^ In the 
present work this generalisation has not been considered as it 
introQuces lot of conplexities in the autoregressive nature of 
dispersion raatrix for e's. 

We consider a case where components of y. consist of 
observations taken at equal interval of time which may give 
rise to errors following a first order autoregressive process. 

For errors we assume that e. •«N(0,a V(p)) where 

V(p) = (Vj,j(p)) =! (d-p^)""^ p^^"J*). (5*2,2) 

Combining (5,2,1) and (5,2*2)# the model can be writtoi as 

y^ = Z b^ 

where 

Z = (Z-:I) I b. = (|3'ie')', 

X <\»X IS} X fsjX 

It can be seen that 

E(y^) ss XjiiJ var(y^) = z D(e)Z'' * q (say)# 
whore 6 is a vector of unJcnown paraineters (l#e# FfO and p) and 

Die) « diag (F^a^(p))# 

Swamy (1971# Section 4,4) has considered a model with 
Z-*X and var(eJ = He has obtained consistent estimators 



i07 


of corrpononts of 0» His sstimabor of is satisfactory if 
large nurnbor of observations (i*e« p large) are available on 
each unit# However, in most longitudinal studies number of 
observations on each unit may hardly exceed eight or ten, b\rt 
it is possible to choose N considerably large (Anderson, 197S) * 

In many cases it may be realistic to asstime that do not 

differ from unit to tanit* In the follovjing we have considered 
such a case for large N# 

For convenience in notation we shall supress the 
dependence of V on p and denote V(P) by V# 

5,3 ESTIMATION 

The parameters of the model considered in Section 5^2 may 
not always be identifiable# One such situation may arise when 
P is a general pxp matrix and in such a case parameters are not 
identifiable# We consider estimation of the parameters under 
the assurrption that the paramaters of the itodel are identifiable* 

For the purpose of estimation of parameters of the model 
considered in the previous section, the ML method is adopted# It 
can be seen that the estimators of M #^,0^ and p cannot be 
obtained by differentiating the liXelihood as it as a very 
conplicated function of the parameters# So the EM-algorithm 
(Derrpster et 1977), which does not require the use of 

likelihood function, is used to obtain the ML estimates# The 
E- and M-steps of the algorithm are given below# 
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If 0 were known, the expressions for estimators of the 
fixed and individual effects are well known (Harville, :.97l) 
and are given by 


• M = (X' X)“^ X' q“1 V ; 

CO 

b.. =D(0)Z' Q^^Cy .-X jU).} 

«■-' ro 

and their variances are 


•0 


(5»3,2) 


=: var( 4 ) = N (X' x)"^- 5 (5,3.3) 

jU 

fO 

= varCbJ =D(0)Z' [Q“^'^'’^X{var(Ii)}X'Q’‘^] ZD(0)^ (5,3»4) 

The estimate of ^ maximises the likelihood based on the marginal 
distribution of the data* Of course, b. is not the maximum 

foi 

likelihood estimator but it can be derived by extension of Gauss~ 
Markov theorem and is enpirical Bayes (Harville, 1976 ), in the 
sense that it has the form 



If we were to observe bj (i.e, 0,. and e.) , we would find 

' CO X rox 

the maximum likelihood estimates of components of 6,. based on 
quadratic forms in as follows r 

F ss S I (5#3#5) 

coiioX 


p as a solution of the cubic 

p^(p-l) - P^(p-2)R- - P[(p+l)V^o3 + pRj » 0 (5,3*6) 

W 



:(.09 


which li3s in (- 3 , 1 ) (Ander 


CS3»0 

a 


(l+?^)K +R- _ 


v/hcre 


1971, p 353) and 
p+Ro “ 2p R, ] 


( 5 3 7 ) 


P 

E } 


/ R-A = r,,„+r 
Since b.'s ax 


-o - ^aa’^^'pp ' = s r,. 


ii+l ^ ^ 


p-1 

S 

-1 - -.e npt observable, we take the conditional exacted 
values piven y, and previous values of « and 6, say and 

It is not difficult to see that 

(k) „(k). 


= (■-,). 


^<SiSllyi'ii‘’=’-g'’^') =ii’^^S<’'>'tvar(eily.,b’=7e<’=>) . 


where 


~(k) 




The equations (5*3*l) and (5# 3*2) provide- the E-stop of 
tho algorithm and the conditional expected values, aiven v 

( Ic ) ro <fo 

and 6 , of the quantities appearing in the equations (5.3*5) 

to (5.3.7) provide the M-step* Thus the algorithm is conpletely 
defined* Those two steps are iterated till the convergence is 
achioved# Denote the converged values as ju and 6(1. e* E.c^', P) 
which gives tho maximum likelihood estimates* 

As 6 is translation invariant and even 'function of the 

IN) ^ 

observations, fX is unbiased for estimating fi (Kackar and 

fa 

Harvlllo, 1984)» Moreover^ for the normal distribution considered 
hero, tho assurrptions for ML estimates to have asynptotic optimum 
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proporti^E aro satisfied, and thus the estimators of P, 
and p are consistent and asyaptotlcally efficient, 

5.4 PREDICTION AND ITS SQUARE ERROR POR A NEW INDIVIDUAL 

us consider the problem of predicting the future 
value of the response of a new Individual given the values of 
Its past response To formallpe the Idea consider predicting 
c e value of the response of the new Individual at time 

corresponding to the design points xf and z' 

belonging to the row spaces o-f v 

P es Of X and respectively* It is 

not difficult to s©© tli^^t •f*‘hcs 

xnxmum mean square error (MSE) 

predictor of v xr 

Vl,0 Yo' If A and e were known, is given by 


Vl.0 = 

= ^2 + ^^2^2'+ g2 ) ^ ^ 


= x' jLi + z' p + v“*^ e 

*s> cm 2 cmO *^1 11 ®0 

where 

Vji = (i-p2)-i (pP,pP-h,.„;p). 


However, If only 8 were too™, then It is natural to consider 


the predictor of y ^ ^ as 

■^p+1,0 


where 


^P+1»0 - ^ + 6^(e) b 

CO ro €o ro coQ 


X - Xn $ 6 (8) = (zi ! V 

to co^ €0 r<* 'i^>2 <^21 ^ XX^ ^ 

It Is easily seen that = (0,0,,.., p). 



Let the MSE of Yp^j^g* It can be seen that 

(0) rs E (v "• V ) ^ 

1 ~ ^^p+3>.0 ^p+i^d 


= (7 4 . 


, £fe)-ix-(x'Q-ix)-i x'Q-i 2 D(e) 6 (e), 

' '■ ]3 i.>. €V» ro 

(5- 4*1) 


ro 


where is given by 

b “b 

coO cjO 


V _ =D(0) -V 

b ~r) b 

roO <oO roQ 


and and are given in C5#3#3) and (5#3#4)^ 
b JU 

cv»0 <s> 

If all the parameters are •unknown then it is natural 
to consider the predictor of Yp^j^g as Yp+^gz given by. 


^■D+ 1-0 = + 6 '( 0 ) bg. 

P*rJL/ ^ CO co’^ 


Thus there is an increase in MSE due to the use of estimated 
values in place of the parameters# Eollowing Kackar and 
Harville (1984) it can be seen that Cy^- r^-y^- J and 

,/s CO 

(y J are independent and hence the amount of increase 

•'p+.l >0 ^p+ 1^0 

in MSE is given by *^(¥ 5 + 1 , 0 "%+:^^' “ 

by Taylor series in © about true values of parameters 6 # then 

CO fs» 

by ignoring higher order terms it can be seen that 

2 - » r j / / r\\ f T 2 




where 
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Now 


= Tr [E{d(e)d'ce)},E{(^e)(e-e)'}l 

r>.i <n; fo CO ‘ f\> rN^ <\> CO 

as d( 0) and § are independent# Thus 


(5^4o2) 


’='Vi,o"yp+to>'" =Tr[A(e).B(e)l 
where 

A( 0) = var [ d( 0) ] 5 B( 0) = E [ ( ^0) (^e) '] . 


C5#4#3) 


rs- cv> <vj <v3 


It may be noted here that Kackar and Harville(i984) have used 
the expression similar to (5^ 4# 2) in derivation of approximate 
MSE for predicting a future observation of the individual whose 
past observations have been used for estimation of 0» This 
approximation may not be valid in general as it implies ignoring 
the terms of the same order as are being retained# They have 
discussed the conditions under which the expression (5#4#2) is 
exactly true# 

The (k/g-)“th element of the matrix AC 6) can be obtained 

CO 

as (see Appendix 5#l) 




(e) 


N 


-1 




# 


where s^s and t^s are as defined in the Appendix 5#1# 

CO 

The value of the matrix bC©) may be obtained as the inverse of 

CO 

information matrix# The terms necessary in the evaluation of 
the information matrix are given by 
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E( 




N -1 

Tr{Z' Q ^ z. — 3 ^ z' Q“^ Z 

k 


3D(e) 
3% ^ 


From equations (5# 4*1) cind (5# 4^*3) we get that 




^ a^-k'v^x+^'(0) 6(0)~|x'(X' Q*"^ X)“^ X' q"*^ ZD( 0)6(0) 

<v^ jj^csi ts> ro ^ iM CO - * - 

coO coO 


CO ro CO 


+ Tr [a(0),b(©)]* 

5# 5 SOME SIMPLE ALTERNATIVE ESTII^IATORS 


(5.4.4) 


In this section we consider two methods of obtaining 
initial estimates. For the sake of sirrplicity we consider 
^1 ” These methods can be extended to a case when Z^ X 
without much difficulty* 


When Zj = X, the model (5.2*1) can be written as 


Xdj + ej , 


where 


Si 


= M + 

ro 


P-i* 

CO 1 


If P (i.e* V(p)) is known then estimates of F,o and U can be 
obtained as in Rao (1975) which are noniterative* In general# 
p is unknown and has to be estimated from the data. Here, we 
suggest two methods of estimating P « These methods aare iterative 
but involve numerical maximization of an objective function of 
one variable (P) only* 



114 


— • In this method p is estimated by assuming (i<.e» o^y 
uo be fixed* If are fixed then the loglikelihood of y_. ■'^s 

can be written as 

log L = - ^ log loglvl --1^ Tr [ v"^ S (y.~Xa.) (y.-Xa,)' ]. 

(5.5*1) 

2 

The iXlL estimates of a,- and a are aiven bv 

a* = (x'v“^x)“Vv**^y. ; a* = S(v,“Xa*)' v~^(y.-Xaf)/N(p-q) 

1 ^ *■ X <\>x "1 1 *^X 

ro csi 

The loglikelihood, after putting the values of and cr^ in 

(5»5»l)/ is given by ■ • 

] 

log L = constant - ~log'Cr^ •- ^ loglVl# (5*5*2) 

The ML estimate is obtained by ntimerically maximizing 
(5.5.2). An alternative method is to differentiate (5*5*2) w.r*t* ’ 
P to obtain cubic equation similar to (5*3*6) and take an appropri— 
ate root, say p . Iterate between and P till they converge 4 

This method is likely to give an underestimate of P and 
the use of this value of p may considerably affect estimation 
of other parameters* 

The reason for getting an underestimate of P is that it ' 

remains the same when c^.^s are known or unknown* When p is 

.^a 

2 

siaall the substitution of a* distorts the estimates of P and o 
considerably, as seen by simulation. The effect of this 

distortion is sjnall when p is very large* Moreover, when p is 

2 

fixed the estlmatoo of structural parameters P and a obtained 
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in this way may not be consistent in presence of incidentaJ. 
parameters a^.'s as discussed by Neyman and Scott (194B)€. 

.2 ! To avoid the difficulties involved in the method 1 
we consioen the estimates oased on certain linear combinations 
of y^s whose expected values do not depend upon fixed effect 
parameters/ as considered by Neyman and Scott (1943), Rao (197?.; 
and Patterson and Thoiipson (1971) » The proposed method is 
known as restricted maxirauM likelihood (REML) , For details 
see Harville (1977)^ 


Let W be any matrix: that is orthogonal to X i»e* 
W'X = Of and define the vectors of residuals r.'s as 

roX 


Ex 



m 


It follows that the first two moments of are given by 

E(rJ = 0 f var(r.) = a\'v(p)w, 

coX ro 

where W is taken in such a way that var(rj) is non-singular 

7 

Estimate of P and a ‘ can be obtained by maximizing the 
likelihood of r^'s# 

This method is free from the limitation of method 1 and 
can be expected to provide better estimate of P • It is not , 

difficult to show that this method gives consistent estimator | 

i 

for largo N. ; 

' i 

When likelihood is based on instead of there | 

are chances that absolute maximum of likelihood may occtir s 

\ 

outside the possible range or very close to the border, [ 

i 

I 

[ 
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UnQsiT thfeSB ciircumstcincQs on^ may lika to clioos© a v ^l ue of p 
in the neighbourhood of boundaries in the parameter space© 


Once the estimator of P (say^ P^ is obtained by tho 
above methoQS/they can be used as though this were the true 
value of p and estimates of and jit may be obtained by 

us?rg Rac'^s (1975) results as follows 




N(p-q)aJ = S(y^-X3^)'(y.-}q3b y 

^ .Sjl ,^1 

(N-.1) (y+a^u). = S - }1J (/. ~ M^) ' 


where 


= (X'V(p^)"’^X)“^ X'V(P^“h I U S= Cx'v(p^)"^)"^ 

2 

^•^hen p is knovm then the estimators of JLI / F and a have the 

optimum properties discussed by Rao Cl975) but when P is nc^ 

known and a consistent estimator is used then, in general, the 

2 

above estimators provide consistent estimators of F and a # 
The estimator of is unbiased as well as consistent# 

We shall call these two methods as enpirical Bayes type 
(EE(l) and EB(2)) methods# 

6.6 MONTE CARLO SIMULATION STUDY 


To study the behaviour of the ML estimates and predictors 
on them, Monte Carlo simulations are carried out# The simulation 
study also enables the coitparison of performance of ML method 
with methods EB(l) and EB(2) • We can also examine the closaiess 
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of expression for prediction I4SE, given by (5# 4*4)/ with MSB 
values# The data are gsnorated in the follovring manner# 


The data vectors of dimension 6 (i»e# p = S) a:'^ 
generated* The design matrices X and are taken to bo ihe 
same and a linear (i,»e# q = rn = 2) fit is considered# Tne 


macrix X is caken as 



1 

1 


1 

3 



4^ 


To study the performance of pr3dictors obtainod by various 
methods, the data corresponding to design point (1,7) are also 
generated# The vector of fixed effects ii is taken as 


11 s= (4-00, 0-75) 

€^> 

The random vectors are generated from a bivariate normal 

distribution with mean 0 and dispersion matrix P, given by 

CO 


. = [ 


2-00 0-00 
0-00 1-40 


Five values of P(~0-40, 0-00, 0-15, 0-40 and 0-75) are 
considered. The value of is chosen in such a way that 
the model explains, on an average, a predetermined percent of 
variability. Two levels (60% and 30%) of variability ej^plained 
and two sairple sizes (30 and 90) are considered- We restrict 
our study to only the selected combinations of. these paraimiters- 

Por prediction purposes, wa have also considered SB 
predictor that would have been obtained if we had considered 
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dispersion matrix of errors as instead of 

As before/ the random vectors are generated usiiig th:. 
subroutine GGNRM of IMSL, One h\mdred runs arc- taken for c:-.ch 
set of parameters# The svimmarized results are given in 
Tables 5#1 - 5»3# 

5*7 RESULTS OF SIMULATION STUDY 

Results on estimation- of parameters : From Tables 5»1 ~ 

5*2 it is evident that the three methods (ML/EB(i) and EfaC2)) 
perform similarly as for as /i is concerned# The estimates of 
P by EB(l) is very highly underestimate# This affects 
estimation of other parameters also# EB(2) appears to provide 
better estimate of P than EB(l)# However, it can be seen that 
for P =: Q#40 (and P * 0*75), P(l/l) by EB(2) is very poorly 
estimated# The reason is that when the data are generated with 
P = 0#40 the estimated value of p by EBC2) comas out to be very 
high (> 0#90) # In such cases we take the value of p^ as 0*9O» 
Whan the estimate of p is high, the matrix (X'V(P^ ^x) ^ 
is illconditioned (i#e# the ratio of the largest diagonal value 
to the smallest is very large) • Since F is obtained by 
substituting o\ times this matrix from a positive definite 
matrix, F could fail to be positive definite# In our case, the 
(1,1) -th term of (X'V(P^ ) “^X) is very large when p^ = 0*90 and 

estimate of F(l,l) is highly negative# Though there are few 
such cases they are dominating, affecting the average estimated 
value of P(l,l)# Because of this, the MSD value for this 
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estimator is also very larger The ML estimate of P obtained 
by EM algorithm is satisfactory for all the cases and never 
gives these type of troubles* 

In all the cases considered here/ I"IL method gives 
estimators with smallest MSDj EB(i) gives sstimators v/ith 
highest MSD (except for P(l/l) when P = 0*40 where EbC 2) 
gives very large MSD values for the reasons described above) 
and EB(2) generally gives better results than EB(l) but pot 
as good as ML method* 

It can be seen that the MSDs atnd asyirptotic variances 

increase with increase in p for all the estimators except for 
2 

estimator of a • This is expected because for P near one 
the information contained in the observations about the 
parameters is small/ whereas reverse is true when correlation 

^ A*, rx /V 

is negative* The asynptotic variances of E/ a and p are 
obtained from the inverse of information matrix* As sairple 
size increases# the MSDs and asyirptotic variances decrease# 
as one will expect* 

The larger the amo\ant of variability explained by the 
model/ smaller are the MSDs and asynptotic variances* There is 
a better agreement between MSD and asymptotic variance v^en 
the amount of variability explained by the model is large than 
when it is small# These patterns are expected* 

5*7.2 Results on prediction : Looking at the Table 5.3 for 
conditional prediction it can be seen that the ML produces MSD th* 
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is smaller ■than by EB(l) and EB(2) methods* Thus the predictors 
based on ML estimates gi've the best predictors though the 
gain over EB(2) is not much* Thus for prediction purposes 
EB(2) method can be chosen without much loss in efficiency,. 

In the last coliamn of Table 5# 3 we give MSD values of prediction 
by Ignoring the autocorrelation in pred^^tion formula/ though 
■they are actually present in the data. Prom the results 
it is apparent that when the correlation is present in the 
errors but prediction has been made by ignoring it then* 
in general/ there is a loss in efficiency# The loss is 
substantial vhen I Pi is high but negligible when P is raoderats*: • 

The formula (5*4# 4) is applicable for a new individual 
case but we have used it for predicting the future response 
of an individual xvhose observations are used in the estimation 
of parameters* Thus the MSDs and MSEs r^orted in Table 5*3 
are not strictly comparable# Infact, the relevant MSD 
corresponding to the value of MSE would be slightly higher 
than the one reported and it has been confirmed by cross 
validation studies* The expression for prediction MSE appears 
to be in good agreement with MSD value and approximation (5*4*4) 
is satisfactory* 

As p increases from *^*40 to 0*75, the MSD decreases 
and so does its theoretical value obtain from equation (5*4*4) * 
Por the values of X and P considered for simulation it can be 
seen that the covariance with immediate successive observation 
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incrsases as P incr'eases# This means that the prediction 
would become more precise as is revealed by reduction in MSD 
and MSE values* However/ this m.ay not be true for all values 
of X and F* 

Greater amount of variability explained by the model 
produces smaller MSD, as one will expect* 

5.8 ILLUSTRATIVE EXAI'^IPLE 

To demonstrate the working of the method described jn 
this chapter, a small exarrple is worked out* The data of 
Williams and Izenman (1981) (see Experiment 2), which give 
weights of 13 (N=13) rats at the interval of 3 days from birth 
to weaning, a.re taken* The data are available for 7 time 
points but for estimation of the parameters data upto 6 (p=6) 
time points are psed* The data at time point 7 are used for 
comparing the performance of the predictors* 

The following model is considered* 

^i = ^ ^ i = 1/2, *..,13, 

with = X* 

A quadratic fit is considered* Tasting the goodness-of— fit of 
the model would be a significant contribution but for the 
timebeing that is not considered* 

Estimates by method EB(2) : The estimates obtained by EB(2) 


come out to be 





r<?hJ(-; W * Observed and Predicted Weidht-s of Mice 
dt b~7 by MeKimum LikelihoodCML) end 
Empirical Bayes type(EB(2)) Methods 


Observed 

Predicted 

Weidhts 

We.i dht 

EB(2) 

ML 

1*191 

1*208 

1 *229 

1. *004 

0*922 

0*919 

0 * 925 

0*918 

0*920 

;l *069 

1*313 

1*126 

0*751 

0*678 

0*665 

0*888 

0*758 

0*775 

0*910 

0*891 

0*902 

0 * 929 

0*917 

0*928 

0*953 

0*871 

0*866 

0*836 

0*870 

0*879 

0*999 

0*989 

0*997 

0*796 

0*633 

0*632 

i.*105 

1*054 

1*068 


MSP 


0*527x10 


-2 


0*537x10 


-2 
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56*55 


F 






lo"^ X 

(0-583, 


0-97 
_ 0-09 
0.065, 


0.08 

0.00 - 0.01 __ 

-0.009)'# 


a^=0* 2 16 xio"^; P*=0*9 33^ 


by i-^L method : The estimates obtained by ML method 

come out to bo 


F = 


yS 

M 



(0.583, 


4.99 

0.96 0.19 

■ 0.01 0,00 

0.066, -0,009) 


0.00 

i 



0.104X1 o'" p= 0,404? 


It can be seen that since the estimated value of P by EB(2) 

2 

is very high, thus the estimation of F and a is considerably 
affected for the reasons explained in Section 5.5. 


Table 5,4 gives the observed and predicted weights of 
rats at t=7 by ML and EB(2) methods. In this case the MSB 
by EB(2) method is slightly less than by ML method. However, 
there is not much difference. This is in agreem^jnt with the 
simulation results# The MSB values are smaller than the 
minimum value (= . 00643 )reported by Rao (l98l). This is not 
surprising because in this model we have one more parameter. 

5^9 BISCUSSION ANB SUMMARY 

♦ 

Swamy (1971) considers a RCR model with errors following 
stationary autoregressive process of order 1 with each unit 
having its own autocorrelation coefficient. His estimators are 
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satisfactory whan thara are largo niomber of observations per 
unito Dug to limitation of time and resources thu observa.tions 
in j-ongitudinal studies may hardly exceed eight or ten but it 
is possible to choose N large- Analysis of RCR models whcn 
errors have known dispersion matrix/ except for constant 
multiplier/ is done by Ro.o (1975)^ In this chapter \jg consider 
a mixed linear model with errors following stationary auto- 
regressive process of order onu when all the units have common 
autocorrolation coefficient# 

As in the previous chapter/ for estimation of the 
model parameters a ML estimation procedure via the EM algorithm 
is adopted* Asynptotic variances of F/<j and p are obtained 
from the inverse of information matrix# Some alternative 
simple estimators of errpirical Bayes type are also considered- 
Performances of these estimates and the ML estimates are studied 
by performing Monte Carlo simulations* 

Empirical Bayes type estimator constructed by Rao (1975) 
have also been investigated for this problem# Two method 
(EB(1) and EB(2)) for estimation of P have been considered^ 
EB(l) is based on the likelihood of all observations assuming 
regression parameters for each unit as fixed# This gives 
considerable underestimate of P for reasons explained in 
Section 5# 5# EB(2) is based on the likelihood of certain 

linear combinations of observations free from fixed effects* 

This method gives satisfactory results for P not too large# 
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For positive and large p this ineth.od sortietimes gives very 

value of p^^# In such cases the estimation of parameters 
is considerably affected# However, the ML procedure of 
Section 5.3 is satisfactory for all the cases# In all the 
cases considered hare ML method gives estimators with least 
MSD; EB(l) generally gives estimators with highest MSD and 
EB(2) gives better results than EB(l) but not as good as ML 
method# 

Conditional prediction on the basis of estimated values 
of the parameters obtained by the three methods is also studied# 

An expression for approximate prediction MSE, taking variability 
due to estimation of parameters# bas been obtained in Section 
5# 4# The expression is complicated but is found to be 
satisfactory on the basis of simulation study# It is also 
seen that there is a better agreement in MSD and MSE as obtained 
from (5# 4# 4) when N is large, as one would expect# As for as 
prediction is concamed,ML method gives smaller MSD than by 
EB(i) and EB(2)# 

The simulation results are based on just 100 iruns<» This 
may not be adequate for a thorough simulation study# Moreover, 
the study is confined to few selected combinations of parameters 
and is not exhaustive# The purpose of study was to confirm 
some of the theoretical results and in nrast of the cases considered 
here there is a good agreement between simulation results and 
what we expect on theoretical grounds# 
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The method of estimation of the parameters can be 
modified without much difficTolty when the errors follow 
nonstationairy autoregressive process of order ones In that 
case the model of Mansour et (1985) can be treated as 

a particular case of this model* 
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Derivation of A( 6 ) 


A(e) = var [d( e) ] f die) = dy.^,,^Q/d 6 


where 


r)J -1 O ~ ^ ^ 

Pll/Vj <S> CJ €vlP 


and ^ and b,^ aro given by (5»3»l) and (5»3«2)^ r 3 f?p 'actively- 
It is easy to observe that 


fo 


-1 -1 ’ 5D(S) 

= -(X'’ Q X) X'P - Z Z'Pv 

3 


and 


r 1 9D(0) . 

° - [i-D( e)z' o"^z] — -7/0 -I 


where 


w -j Z'Q (Yq-XM) 

-1 -.1 -1 -1 

+ D(e)Z'Q -^X(X'Q -^X) X'Q •‘■Z Z'Py 

9 «^. 


P = - q”^x(x'q*"^x)*'^ 


From (i) and (ii) it follows that 


d^^Ce) = 9 yp+ 3 _^o/^®k 


(i) 


(ii) 


= y + t' (v - X .a) 

roiv ^ fojv cdO Cn3 


where 


3Df e) 


' _ -.[ x'“ (©)DCe)z''Q“-^x ] Cx'q"^x) ^x'o“^z z'p 




and 

. 3D ( e) 36' ( e) - 

tt =[ 6 /(e) ci-D( e)z'Q“^z} D(®) ] z'q“^ 


CmJV CO «N> 
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Noting that 

varC-^ -xU) = (a + aj/N! .? 
g) = q^/N 

where 

= X(x'q“^X)"^ X' 

it follows that (k,S,)-^h element of A( 6) is given bv 

+ “£k'0+0l/N>Sj + 25k Oi&J 1 • 



DATA SETS 


I: /'t-'0?r'LiTic.'i 1 1, L ♦ Tho data oT Strenio ^*(1983) are reported 
herej^The daLa are modi Ti cat ion of the data diven in Bojt 
0.91.10) hu addLiiiJ random normal erroru followind N(0 j 81) iri 
the (/jiMdht <i.n iJm) of LO rata of control droup at a weehla 
1 1 1 tf;rv :d| , The mtitfier 'a wojdht ia artificially introduced as 
a i.ovjriate* 

We).dhbi> of rata in the control a’roup 


Week Mother's 

F?at 0 12 3 4 Weight 


61 

72 

118 

130 

176 

170 

6t) 

85 

129 

148 

174 

194 

•'3/' 

68 

130 

143 

201 

187 

46 

74 

116 

124 

157 

156 

A/ 

85 

103 

117 

148 

155 

43 

58 

109 

133 

152 

150 

153 

62 

82 

112 

156 

138 

7? 

96 

117 

129 

154 

154 

153 

54 

87 

120 

138 

149 

72 

98 

114 

144 

177 

167 


10 



13 

r I men L *^ + The weigh ts of 13 male mice irreasured at succesive 
intervals of 3 days from birth to weanind as reported by 


Will iams 

and Tz 

ertmari(1981) are 

diven in 

the following 

table. 

UeiE^ht o 

f 13 mice at : 

intervals 

of 3 days 

1 from 

birth to 

weanind 

No ♦/Days 3 

6 

9 

12 

15 

18 

21 

1 

0,190 

0,338 

0,621 

0,823 

1,078 

1,132 

1,191 

'3 

A«* 

0,218 

0 . 393 

0,568 

0,729 

0,839 

0,852 

1,004 

3 

0,211 

0,394 

0,549 

0,700 

0,783 

0,870 

0,925 

4 

0,209 

0,419 

0,645 

0,850 

1,001 

1,026 

1,069 

5 

0,193 

0,362 

0,520 

0,530 

0,641 

0,640 

0,751 

6 

0,201 

0,361 

0,502 

0,'530 

0,657 

0,762 

0,888 

? 

0 , 202 

0,370 

0,498 

0,650 

0,795 

0,858 

0,910 

8 

0 , 1 90 

0,350 

0,510 

0,666 

0,819 

0,879 

0,929 

9 

0,219 

0,399 

0,578 

0,699 

0,709 

0,822 

0,953 

10 

0.235 

0,400 

0,545 

0,690 

0,796 

0.825 

0,836 

1.1 

0,224 

0,381 

0,577 

0,756 

0,869 

0,929 

0,999 

J 2 

0,187 

0,329 

0,441 

0,525 

0,589 

0,621 

0,796 

13 

0,278 

0,471 

0,606 

0,770 

0,888 

1,001 

1.105 
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1. .vJ^'erliTioi it SJfhe data of Elston and Grizsle(1962) 
are rer-orted here. The data consist of observations 
on ramus heisht (in mm) for 20 boys of aSe 8 years 
at four time f-oints at half yearly intervals* 

Ramus heL?<ht data 


in years 


No* 

8*0 

8*5 

9*0 

9*5 

1 

47*8 

48*8 

49*0 

49*7 

o 

A* 

46*4 

47*3 

47*7 

48*4 

3 

46*3 

46*8 

47*8 

48*5 

A 

45*1 

45*3 

46*1 

47*2 

s 

47*6 

48*5 

48*9 

49*3 

6 

52*5 

53*2 

53*3 

53*7 

? 

51*2 

53*0 

54*3 

54*5 

8 

49*8 

50*0 

50*3 

52*7 

9 

48*1 

50*8 

52*3 

54*4 

to 

45*0 

47*0 

47*3 

48*3 

1.1 

51*2 

51*4 

51*6 

51*9 

12 

48*5 

49*2 

53*0 

55*5 

13 

52*1 

52*8 

53*7 

55*0 

14 

48*2 

48*9 

49*3 

49*8 

15 

49*6 

50*4 

51*2 

51*8 

16 

50*7 

51,7 

52*7 

53*3 

17 

47*2 

47*7 

48*4 

49*5 

18 

53*3 

54*6 

55*1 

55*3 

19 

46*2 

47*5 

48*1 

48*4 
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